1. The Data Processing Pipeline
The analysis code begins with the data processing pipeline. This pipeline starts by downloading the latest economic indicators and ends by outputting a set of features that can be input into the VAR modeling pipeline.
Its main purpose is to identify the series of transformations that will make each time series indicator stationary so that the indicators are suitable for use in a vector autoregression.
The code for the pipeline is available at:
src/fyp_analysis/pipelines/data_processing/
(link)
Running the Pipeline
To run the pipeline, execute:
where dp
is short for "data processing".
Parameters
The parameters for the data processing pipeline can be set in the file:
conf/base/parameters/data_processing.yml
(link).
The parameters are:
- fresh_indicators: whether to download fresh economic indicators
- seasonal_adjustments: the names of the columns to apply seasonal adjustments to
- min_feature_year: the minimum year to trim the indicators to
Steps
This section outlines the steps (also called nodes) in the data processing pipeline. The steps are defined in the src/fyp_analysis/pipelines/data_processing/pipeline.py file. In this file, we define the function to run for each step, as well as the inputs and outputs of each function.
This pipeline will download the latest version of a set of economic indicators, perform various transformations, and output a set of features suitable to be used as input to the modeling pipeline.
Warning
Make sure you have properly set up your local API credentials before running this pipeline. Otherwise, you won't be able to download all of the necessary indicators. See the setup instructions for more information.
In python, the pipeline is defined as follows:
def create_pipeline(**kwargs):
return Pipeline(
[
node(
func=get_economic_indicators,
inputs="params:fresh_indicators",
outputs="economic_indicators",
name="economic_indicators_node",
),
node(
func=get_quarterly_averages,
inputs="economic_indicators",
outputs="quarterly_features_raw",
name="quarterly_features_raw_node",
),
node(
func=impute_cbo_values,
inputs=[
"quarterly_features_raw",
"params:plan_start_year",
"params:cbo_forecast_date",
],
outputs="quarterly_features_cbo_imputed",
name="impute_cbo_node",
),
node(
func=combine_features_and_bases,
inputs=["quarterly_features_cbo_imputed", "plan_details"],
outputs="features_and_bases",
name="combine_features_bases_node",
),
node(
func=seasonally_adjust_features,
inputs=["features_and_bases", "params:seasonal_adjustments"],
outputs="features_and_bases_sa",
name="seasonal_adjustment_node",
),
node(
func=get_stationary_guide,
inputs="features_and_bases_sa",
outputs="stationary_guide",
name="stationary_guide_node",
),
node(
func=get_final_unscaled_features,
inputs=["features_and_bases_sa", "params:min_feature_year"],
outputs="final_unscaled_features",
name="final_unscaled_features_node",
),
node(
func=get_final_scaled_features,
inputs=[
"final_unscaled_features",
"stationary_guide",
],
outputs="final_scaled_features",
name="final_scaled_features_node",
),
]
)
Reminder
As described here, if you are
working with IPython or in a Jupyter notebook, you can load any named
dataset (the inputs/outputs above) using the catalog.load()
function.
For example, to load the "economic_indicators" dataset (the output from step 1), use:
Step 1: Download indicators
- Function:
get_economic_indicators()
- Purpose: Download the latest set of economic indicators and save them locally
- Inputs:
- Parameter:
fresh_indicators
- Parameter:
- Outputs:
- Dataset:
economic_indicators
in thedata/02_intermediate/
folder
- Dataset:
Economic indicators are defined in the src/fyp_analysis/pipelines/data_processing/indicators/sources folder. Right now, there are various sources, including FRED, Quandl, CARTO (Philadelphia open data), and Zillow, with a JSON file for each source that lists the information necessary for download. New indicators can be added by adding a new entry to the appropriate JSON file.
The current set of indicators includes the following:
name | description | source | frequency | geography |
---|---|---|---|---|
ActivityLicensesPhilly | Commercial activity licenses for the City of Philadelphia | carto | monthly | Philadelphia |
BizLicensesPhilly | Business licenses for the City of Philadelphia | carto | monthly | Philadelphia |
BuildingPermitsPhilly | New construction permits for the City of Philadelphia | carto | monthly | Philadelphia |
DeedTransfersPhilly | Deed real estate transfers for the City of Philadelphia | carto | monthly | Philadelphia |
10YearTreasury | 10-Year Treasury Constant Maturity Rate | fred | monthly | national |
3MonthTreasury | 3-Month Treasury Bill: Secondary Market Rate | fred | monthly | national |
AlcoholSales | Retail Sales: Beer, Wine, and Liquor Stores | fred | quarterly | national |
BuildingPermitsPhillyMSA | New Private Housing Units Authorized by Building Permits for Philadelphia-Camden-Wilmington, PA-NJ-DE-MD (MSA) | fred | quarterly | Philadelphia MSA |
CPIPhillyMSA | Consumer Price Index for All Urban Consumers: All Items in Philadelphia-Camden-Wilmington, PA-NJ-DE-MD | fred | monthly | Philadelphia MSA |
CPIU | Consumer Price Index for All Urban Consumers: All Items in U.S. City Average | fred | monthly | national |
CarSales | Total Vehicle Sales | fred | daily | national |
ConsumerConfidence | Consumer Opinion Surveys: Confidence Indicators: Composite Indicators: OECD Indicator for the United States | fred | monthly | national |
ContinuedClaimsPA | Continued Claims (Insured Unemployment) in Pennsylvania | fred | weekly | state |
CorporateProfits | Corporate Profits with Inventory Valuation Adjustment (IVA) and Capital Consumption Adjustment (CCAdj) | fred | quarterly | national |
EconomicConditionsPhillyMSA | Economic Conditions Index for Philadelphia-Camden-Wilmington, PA-NJ-DE-MD (MSA) | fred | monthly | Philadelphia MSA |
EmploymentCostIndex | Employment Cost Index: Wages and Salaries: Private Industry Workers | fred | quarterly | national |
FHFAHousePriceIndex | Purchase Only House Price Index for the United States | fred | quarterly | national |
FedFundsRate | Effective Federal Funds Rate | fred | monthly | national |
GDP | Gross Domestic Product | fred | quarterly | national |
GDPPriceIndex | Gross Domestic Product: Chain-type Price Index | fred | quarterly | national |
GovtSocialBenefits | Federal government current transfer payments: Government social benefits: to persons | fred | quarterly | national |
HousePriceIndexPhillyMSA | All-Transactions House Price Index for Philadelphia, PA (MSAD) | fred | quarterly | Philadelphia MSA |
HousingStarts | Housing Starts: Total: New Privately Owned Housing Units Started | fred | monthly | national |
HousingSupply | Monthly Supply of Houses in the United States | fred | monthly | national |
InitialClaimsPA | Initial Claims in Pennsylvania | fred | weekly | state |
JobOpenings | Job Openings: Total Nonfarm | fred | monthly | national |
ManufacturingHoursWorked | Weekly Hours Worked: Manufacturing for the United States | fred | quarterly | national |
NYCGasPrice | Conventional Gasoline Prices: New York Harbor, Regular | fred | daily | national |
NewManufacturingOrders | Manufacturers' New Orders: Nondefense Capital Goods Excluding Aircraft | fred | monthly | national |
NonfarmEmployeesPhilly | All Employees: Total Nonfarm in Philadelphia City, PA | fred | monthly | Philadelphia |
NonfarmEmployeesPhillyMSA | All Employees: Total Nonfarm in Philadelphia-Camden-Wilmington, PA-NJ-DE-MD (MSA) | fred | monthly | Philadelphia MSA |
NonfarmEmployment | All Employees, Total Nonfarm | fred | monthly | national |
NonresidentialInvestment | Private Nonresidential Fixed Investment | fred | quarterly | national |
OilPriceWTI | Crude Oil Prices: West Texas Intermediate (WTI) - Cushing, Oklahoma | fred | monthly | national |
PCE | Personal Consumption Expenditures | fred | monthly | national |
PCEPriceIndex | Personal Consumption Expenditures: Chain-type Price Index | fred | monthly | national |
PPI | Producer Price Index for All Commodities | fred | monthly | national |
PersonalIncome | Personal Income | fred | quarterly | national |
PersonalIncomePhillyMSA | Per Capita Personal Income in Philadelphia County/city, PA | fred | annual | Philadelphia MSA |
PersonalSavingsRate | Personal Savings Rate | fred | monthly | national |
PopulationPhilly | Resident Population in Philadelphia County/city, PA | fred | annual | Philadelphia |
PrimeEPOP | Employment-Population Ratio | fred | monthly | national |
RealDisposablePersonalIncome | Real Disposable Personal Income | fred | monthly | national |
RealGDP | Real Gross Domestic Product | fred | quarterly | national |
RealGDPPhillyMSA | Total Real Gross Domestic Product for Philadelphia-Camden-Wilmington, PA-NJ-DE-MD | fred | annual | Philadelphia MSA |
RealRetailFoodServiceSales | Advance Real Retail and Food Services Sales | fred | monthly | national |
ResidentialInvestment | Private Residential Fixed Investment | fred | quarterly | national |
SahmRule | Real-time Sahm Rule Recession Indicator | fred | monthly | national |
TotalBusinessSales | Total Business Sales | fred | monthly | national |
UncertaintyIndex | Economic Policy Uncertainty Index for United States | fred | monthly | national |
UnemploymentPhilly | Unemployment Rate in Philadelphia County/City, PA | fred | monthly | Philadelphia |
UnemploymentPhillyMSA | Unemployment Rate in Philadelphia-Camden-Wilmington, PA-NJ-DE-MD | fred | monthly | Philadelphia MSA |
UnemploymentRate | Unemployment Rate | fred | monthly | national |
Wage&Salaries | Compensation of Employees, Received: Wage and Salary Disbursements | fred | monthly | national |
WagesPhillyMSA | Average Weekly Wages for Employees in Total Covered Establishments in Philadelphia-Camden-Wilmington, PA-NJ-DE-MD | fred | quarterly | Philadelphia MSA |
WeeklyEconomicIndex | Weekly Economic Index (Lewis-Mertens-Stock) | fred | weekly | national |
YieldCurve | 10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity | fred | daily | national |
SP500 | Monthly S&P 500 Price | quandl | monthly | national |
HousingInventoryPhillyMSA | For-Sale Inventory (Smooth, All Homes, Monthly) | zillow | monthly | Philadelphia MSA |
MeanDaysToSalePhillyMSA | Mean Days to Pending (Smooth, All Homes, Monthly) | zillow | monthly | Philadelphia MSA |
MedianHomeValuePhilly | ZHVI All Homes (SFR, Condo/Co-op) Time Series | zillow | monthly | Philadelphia |
MedianListPricePhillyMSA | Median List Price (Smooth, All Homes, Monthly) | zillow | monthly | Philadelphia MSA |
RentIndexPhillyMSA | ZORI (Smoothed, Seasonally Adjusted} All Homes Plus Multifamily | zillow | monthly | Philadelphia MSA |
Step 2: Impute CBO values
- Function:
impute_cbo_values()
- Purpose: Impute CBO forecast values for Q4 of the current fiscal year.
- Inputs:
- Dataset:
economic_indicators
- Dataset:
- Outputs:
- Dataset:
quarterly_features_raw
in thedata/02_intermediate/
folder
- Dataset:
For economic indicators that CBO is projections for, this will impute the forecast value for Q4 of the current fiscal year, where an actual value is lacking.
Step 3: Get quarterly averages
- Function:
get_quarterly_averages()
- Purpose: Get the quarterly averages of the indicators and remove any indicators with annual frequency.
- Inputs:
- Dataset:
quarterly_features_raw
- Parameter:
plan_start_year
- Parameter:
cbo_forecast_date
- Dataset:
- Outputs:
- Dataset:
quarterly_features_cbo_imputed
in thedata/02_intermediate/
folder
- Dataset:
Step 4: Combine indicators and tax bases
- Function:
combine_features_and_bases()
- Purpose: Combine the economic indicator features and the tax base data into a single data frame.
- Inputs:
- Dataset:
quarterly_features_cbo_imputed
- Dataset:
plan_details
- Dataset:
- Outputs:
- Dataset:
features_and_bases
in thedata/02_intermediate/
folder
- Dataset:
Step 5: Seasonally adjust features
- Function:
seasonally_adjust_features()
- Purpose: Seasonally adjust the specified columns, using the LOESS functionality in statsmodels.
- Inputs:
- Dataset:
features_and_bases
- Parameter:
seasonal_adjustments
- Dataset:
- Outputs:
- Dataset:
features_and_bases_sa
in thedata/02_intermediate/
folder
- Dataset:
Step 6: Calculate stationary guide
- Function:
get_stationary_guide()
- Purpose: Make the stationary guide, a spreadsheet which contains the instructions for making each feature stationary.
- Inputs:
- Dataset:
features_and_bases_sa
- Dataset:
- Outputs:
- Dataset:
stationary_guide
in thedata/02_intermediate/
folder
- Dataset:
For each feature, the stationary guide contains the following information:
- Can we take the log of the variable (e.g., is it non-negative?)?
- How many differences for stationary?
- Should we normalize the data first?
The spreadsheet is available in the data/02_intermediate/
folder
(link).
This step also creates the diagnostic stationary plots for all tax bases and
save them to data / 02_intermediate / stationary_figures
. These figures test
the autocorrelation and partial autocorrelation of the time series. For
example, the stationary figure for the Wage Tax is:
Step 7: Final unscaled features
- Function:
get_final_unscaled_features()
- Purpose: Get the final unscaled features to input into the modeling pipeline. The only additional preprocessing performed in this step is trimming to the specific minimum year for all features and tax bases.
- Inputs:
- Dataset:
features_and_bases_sa
- Parameter:
min_feature_year
- Dataset:
- Outputs:
- Dataset:
final_unscaled_features
in thedata/03_feature/
folder
- Dataset:
Step 8: Final scaled features
- Function:
get_final_scaled_features()
- Purpose: Get the final scaled features to input into the modeling pipeline. This applies the final preprocessor based on the "stationary guide." For each feature, it takes the log of the feature if able (if not, it applies a normalization). Finally, the preprocessor differences the feature until it is stationary.
- Inputs:
- Dataset:
final_unscaled_features
- Dataset:
- Outputs:
- Dataset:
final_scaled_features
in thedata/03_feature/
folder
- Dataset: