The data/ Folder
The data/
folder holds the various data inputs and outputs from the analysis
pipelines. There is a specific folder to hold the raw input data (e.g.,
historical tax revenues and economic indicators), as well as folders to hold
intermediate data products and modeling results.
Data Layers
The data/
folder follows the data engineering syntax used by the kedro
package, which specifies a way of organizing data into layers. The data layers
are:
Note: We are not using the "primary" data layer in this analysis.
There are three folders associated with the data processing
pipeline:
01_raw
: The raw data files that serve as the starting point for the analysis. This is the ground truth data that should not be modified by the analysis in any way.02_intermediate
: Intermediate data products produced by the data processing pipeline.03_feature
: The final unscaled and scaled versions of the features that will be input into the modeling pipeline.
There are four folders associated with the modeling
pipeline:
04_model_input
: Data, other than the features, that is input into the modeling pipeline.05_models
: Regression models.06_model_output
: Files output by the regression models.07_reporting
: Files for reporting final results.
Raw Data
There are four folders in the 01_raw
folder:
cbo
: CBO 10-year economic projection spreadsheetshistorical
: Historical tax rates and revenue dataindicators
: The economic indicator dataplans
: The data from the Five Year Plan, e.g., projected revenues and rates over the Plan; this comes in two flavors: the "proposed" and the "adopted" versions
Historical Data Files
The data/01_raw/historical/
folder contains the historical data necessary for
the analysis. In particular, there is historical tax revenue data, tax rate
data, and data related to the breakdown of the net income and gross receipt
portions of BIRT.
Revenues
In the data/01_raw/historical/revenues/
folder
(link),
there are spreadsheets holding the annual and quarterly tax revenue data:
The quarterly data file is the main input data file required for the analysis. The annual file is not required for the analysis but useful for tracking the audited annual totals for each tax. Each year, these data files need to be updated with the latest historical data. See the update usage instructions for more information.
Warning
The data for Wage & Earnings Tax and the Net Profits Tax are for the current year only (excludes prior year totals). This will be important when updating the historical data files.
Rates
The data/01_raw/historical/rates/
folder
(link)
contains a CSV file for each tax that contains the historical tax rate for each
fiscal year.
This does not contain any projected rates, only historical. Each year the rate for the latest fiscal year should be added to each of the files.
BIRT Splits
The analysis requires the historical breakdown between the net income and gross receipts portion of BIRT. The information for the latest fiscal year can be obtained by the BIRT revenue model sent over by the Budget Office each year.
Indicators
The raw economic indicators are stored in the data/01_raw/indicators/
folder
(link).
Other than the historical tax revenue data, this is the main source of input
data for the analysis.
CBO
The 10-year economic projections from the CBO are stored in this folder. The CBO data variables are used as exogenous variables in the regression modeling.
Plans
The data/01_raw/plans/
folder
(link)
holds the Five Year Plan projection information for revenues and rates. There
should be separate files for the proposed and adopted versions of the Plan.
Note
For more information on these files, see the Plan Details section.