4. The Forecasting Pipeline
The forecast pipeline runs the vector auto-regressions and produces the final forecasts for each tax revenue. Each of the three previous steps must be performed before this step.
After identifying the best-fit set of
forecast parameters for each tax in the exploratory step,
this pipeline will run the VAR model for each tax with these parameters to
generate the final fits. It outputs a summary file of
the revenue forecasts: data/07_reporting/revenue_summary.xlsx
.
The pipeline relies on the statsmodels
package to calculate vector
autoregressions (VAR) for each tax base. For more information on statsmodels
,
see the documentation for VARs.
The code for the pipeline is available in:
src/fyp_analysis/pipelines/forecast/
(link)
Running the Pipeline
To run the forecast pipeline, execute:
Parameters
The parameters for the data processing pipeline can be set in the
file: conf/base/parameters/forecast.yml
(link). The parameters are:
- max_fit_date: The maximum date to include in the VAR fits.
- forecast_types: For each tax base, what kind of forecast to run, either "var" or "file". This will depend on what notebook you used in the exploratory phase. If you generated a tax base forecast not from VAR fits in the notebook for a tax, you will want to specify "file" here. See for example, the realty transfer tax forecast notebook.
Steps
This section outlines the steps (also called nodes) in the forecast pipeline. The steps are defined in the src/fyp_analysis/pipelines/forecast/pipeline.py file. In this file, we define the function to run for each node, as well as the inputs and outputs.
This pipeline will calculate a set of forecasts and output a summary of the results.
In python, the pipeline is defined as follows:
def create_pipeline(**kwargs):
return Pipeline(
[
node(
func=run_forecasts,
inputs=[
"final_unscaled_features",
"stationary_guide",
"params:plan_start_year",
"params:plan_type",
"params:cbo_forecast_date",
"params:forecast_types",
],
outputs=["tax_base_forecasts", "tax_revenue_forecasts"],
name="forecasting_node",
),
node(
func=report_forecast_results,
inputs=[
"params:plan_start_year",
"tax_revenue_forecasts",
"tax_base_forecasts",
],
outputs=None,
name="reporting_node",
),
]
)
Step 1: Run forecasts
- Function:
run_forecasts()
- Purpose: Run the forecasts for each tax base.
- Inputs:
- Dataset:
final_scaled_features
- Dataset:
grangers_matrix
- Dataset:
- Outputs:
- Dataset:
tax_base_forecasts
, saved as06_model_output/final_tax_bases.csv
- Dataset:
tax_revenue_forecasts
, saved as06_model_output/final_tax_revenues.csv
- Dataset:
Step 2: Summarize the results
- Function:
report_forecast_results()
- Purpose: Create the summary file of the revenue forecasts
- Inputs:
- Parameter:
plan_start_year
- Dataset:
tax_base_forecasts
- Dataset:
tax_revenue_forecasts
- Parameter:
- Outputs:
- This outputs a summary excel file
data/07_reporting/revenue_summary.xlsx
- This outputs a summary excel file