Chapter 8 Testing the Convergence Hypothesis
8.1 Introduction
We provide an additional empirical example of partialling-out with Lasso to estimate the regression coefficient \(\beta_1\) in the high-dimensional linear regression model: \[ Y = \beta_1 D + \beta_2'W + \epsilon. \]
Specifically, we are interested in how the rates at which economies of different countries grow (\(Y\)) are related to the initial wealth levels in each country (\(D\)) controlling for country’s institutional, educational, and other similar characteristics (\(W\)).
The relationship is captured by \(\beta_1\), the speed of convergence/divergence, which measures the speed at which poor countries catch up \((\beta_1< 0)\) or fall behind \((\beta_1> 0)\) rich countries, after controlling for \(W\). Our inference question here is: do poor countries grow faster than rich countries, controlling for educational and other characteristics? In other words, is the speed of convergence negative: $ _1 <0?$ This is the Convergence Hypothesis predicted by the Solow Growth Model. This is a structural economic model. Under some strong assumptions, that we won’t state here, the predictive exercise we are doing here can be given causal interpretation.
The outcome \(Y\) is the realized annual growth rate of a country’s wealth (Gross Domestic Product per capita). The target regressor (\(D\)) is the initial level of the country’s wealth. The target parameter \(\beta_1\) is the speed of convergence, which measures the speed at which poor countries catch up with rich countries. The controls (\(W\)) include measures of education levels, quality of institutions, trade openness, and political stability in the country.
8.2 Data analysis
We consider the data set GrowthData which is included in the package hdm. First, let us load the data set to get familiar with the data.
library(hdm)
library(xtable)
import pandas as pd
import numpy as np
import pyreadr
import statsmodels.api as sm
import statsmodels.formula.api as smf
Import data
<- GrowthData
growth attach(growth)
# check variables
names(growth)
## [1] "Outcome" "intercept" "gdpsh465" "bmp1l" "freeop" "freetar"
## [7] "h65" "hm65" "hf65" "p65" "pm65" "pf65"
## [13] "s65" "sm65" "sf65" "fert65" "mort65" "lifee065"
## [19] "gpop1" "fert1" "mort1" "invsh41" "geetot1" "geerec1"
## [25] "gde1" "govwb1" "govsh41" "gvxdxe41" "high65" "highm65"
## [31] "highf65" "highc65" "highcm65" "highcf65" "human65" "humanm65"
## [37] "humanf65" "hyr65" "hyrm65" "hyrf65" "no65" "nom65"
## [43] "nof65" "pinstab1" "pop65" "worker65" "pop1565" "pop6565"
## [49] "sec65" "secm65" "secf65" "secc65" "seccm65" "seccf65"
## [55] "syr65" "syrm65" "syrf65" "teapri65" "teasec65" "ex1"
## [61] "im1" "xr65" "tot1"
# I downloaded the data that the author used
= pyreadr.read_r("./data/GrowthData.RData")
growth_read # Extracting the data frame from rdata_read
= growth_read[ 'GrowthData' ]
growth # check variables
list(growth)
## ['Outcome', 'intercept', 'gdpsh465', 'bmp1l', 'freeop', 'freetar', 'h65', 'hm65', 'hf65', 'p65', 'pm65', 'pf65', 's65', 'sm65', 'sf65', 'fert65', 'mort65', 'lifee065', 'gpop1', 'fert1', 'mort1', 'invsh41', 'geetot1', 'geerec1', 'gde1', 'govwb1', 'govsh41', 'gvxdxe41', 'high65', 'highm65', 'highf65', 'highc65', 'highcm65', 'highcf65', 'human65', 'humanm65', 'humanf65', 'hyr65', 'hyrm65', 'hyrf65', 'no65', 'nom65', 'nof65', 'pinstab1', 'pop65', 'worker65', 'pop1565', 'pop6565', 'sec65', 'secm65', 'secf65', 'secc65', 'seccm65', 'seccf65', 'syr65', 'syrm65', 'syrf65', 'teapri65', 'teasec65', 'ex1', 'im1', 'xr65', 'tot1']
We determine the dimension of our data set.
dim(growth)
## [1] 90 63
growth.shape
## (90, 63)
8.2.1 OLS
The sample contains \(90\) countries and \(63\) controls. Thus \(p \approx 60\), \(n=90\) and \(p/n\) is not small. We expect the least squares method to provide a poor estimate of \(\beta_1\). We expect the method based on partialling-out with Lasso to provide a high quality estimate of \(\beta_1\). To check this hypothesis, we analyze the relation between the output variable \(Y\) and the other country’s characteristics by running a linear regression in the first step.
<- lm(Outcome~.-1,data=growth) reg.ols
# We create the main variables
= growth['Outcome']
y = growth.drop('Outcome', 1)
X # OLS regression
= sm.OLS(y, X).fit() reg_ols
We determine the regression coefficient \(\beta_1\) of the target regressor gdpsh465 (\(D\)), its 95% confidence interval and the standard error.
# output: estimated regression coefficient corresponding to the target regressor
<- summary(reg.ols)$coef["gdpsh465",1]
est_ols
# output: std. error
<- summary(reg.ols)$coef["gdpsh465",2]
std_ols
# output: 95% confidence interval
<- confint(reg.ols)[2,]
ci_ols
<- as.data.frame(cbind(est_ols,std_ols,ci_ols[1],ci_ols[2]))
results_ols colnames(results_ols) <-c("estimator","standard error", "lower bound CI", "upper bound CI")
rownames(results_ols) <-c("OLS")
# output: estimated regression coefficient corresponding to the target regressor
= reg_ols.summary2().tables[1]['Coef.']['gdpsh465']
est_ols
# output: std. error
= reg_ols.summary2().tables[1]['Std.Err.']['gdpsh465']
std_ols
# output: 95% confidence interval
= reg_ols.summary2().tables[1]['[0.025']['gdpsh465']
lower_ci = reg_ols.summary2().tables[1]['0.975]']['gdpsh465'] upper_ci
<- matrix(0, 1, 4)
table 1,1:4] <- c(est_ols,std_ols,ci_ols[1],ci_ols[2])
table[colnames(table) <-c("estimator","standard error", "lower bound CI", "upper bound CI")
rownames(table) <-c("OLS")
<- xtable(table, digits = 3)
tabprint(tab,type="html") # set type="latex" for printing table in LaTeX
## <!-- html table generated in R 4.0.4 by xtable 1.8-4 package -->
## <!-- Wed Nov 24 12:30:35 2021 -->
## <table border=1>
## <tr> <th> </th> <th> estimator </th> <th> standard error </th> <th> lower bound CI </th> <th> upper bound CI </th> </tr>
## <tr> <td align="right"> OLS </td> <td align="right"> -0.009 </td> <td align="right"> 0.030 </td> <td align="right"> -0.071 </td> <td align="right"> 0.052 </td> </tr>
## </table>
= np.zeros( (1, 4) )
table_1 0,0] = est_ols
table_1[0,1] = std_ols
table_1[0,2] = lower_ci
table_1[0,3] = upper_ci
table_1[= pd.DataFrame( table_1, columns = [ "Estimator","Std. Error", "lower bound CI", "upper bound CI" ])
table_1_pandas = [ "OLS" ]
table_1_pandas.index = table_1_pandas.to_html()
table_1_html table_1_html
## '<table border="1" class="dataframe">\n <thead>\n <tr style="text-align: right;">\n <th></th>\n <th>Estimator</th>\n <th>Std. Error</th>\n <th>lower bound CI</th>\n <th>upper bound CI</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>OLS</th>\n <td>-0.009378</td>\n <td>0.029888</td>\n <td>-0.0706</td>\n <td>0.051844</td>\n </tr>\n </tbody>\n</table>'
estimator | standard error | lower bound CI | upper bound CI | |
---|---|---|---|---|
OLS | -0.009 | 0.030 | -0.071 | 0.052 |
8.2.2 Lasso
Least squares provides a rather noisy estimate (high standard error) of the speed of convergence, and does not allow us to answer the question about the convergence hypothesis since the confidence interval includes zero. In contrast, we can use the partialling-out approach based on lasso regression (“Double Lasso”).
<- growth[, 1, drop = F] # output variable
Y <- as.matrix(growth)[, -c(1, 2,3)] # controls
W <- growth[, 3, drop = F] # target regressor D
= growth['Outcome']
Y = growth.drop(['Outcome','intercept', 'gdpsh465'], 1 )
W = growth['gdpsh465'] D
we run the regression using lasso
<- rlasso(x=W,y=Y)$res # creates the "residual" output variable
r.Y <- rlasso(x=W,y=D)$res # creates the "residual" target regressor
r.D <- lm(r.Y ~ r.D)
partial.lasso <- partial.lasso$coef[2]
est_lasso <- summary(partial.lasso)$coef[2,2]
std_lasso <- confint(partial.lasso)[2,] ci_lasso
from sklearn import linear_model
# Seat values for Lasso
= linear_model.Lasso( alpha = 0.00077 )
lasso_model = Y - lasso_model.fit( W, Y ).predict( W ) r_Y
## C:\Users\MSI-NB\ANACON~1\envs\TENSOR~2\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:530: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.0058288944773880885, tolerance: 2.3434976975716032e-05
## model = cd_fast.enet_coordinate_descent(
= r_Y.rename('r_Y')
r_Y = D - lasso_model.fit( W, D ).predict( W ) # Part. out d r_D
## C:\Users\MSI-NB\ANACON~1\envs\TENSOR~2\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:530: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2.99724470007767, tolerance: 0.007147912790119585
## model = cd_fast.enet_coordinate_descent(
= r_D.rename('r_D')
r_D = sm.OLS(r_Y, r_D).fit() # Regress residuales
partial_lasso_fit
= partial_lasso_fit.summary2().tables[1]['Coef.']['r_D']
est_lasso = partial_lasso_fit.summary2().tables[1]['Std.Err.']['r_D']
std_lasso = partial_lasso_fit.summary2().tables[1]['[0.025']['r_D']
lower_ci_lasso = partial_lasso_fit.summary2().tables[1]['0.975]']['r_D'] upper_ci_lasso
now we export results into a table
<- matrix(0, 1, 4)
table 1,1:4] <- c(est_lasso,std_lasso,ci_lasso[1],ci_lasso[2])
table[colnames(table) <-c("estimator","standard error", "lower bound CI", "upper bound CI")
rownames(table) <-c("Double Lasso")
<- xtable(table, digits = 3)
tabprint(tab,type="html") # set type="latex" for printing table in LaTeX
= np.zeros( (1, 4) )
table_2 0,0] = est_lasso
table_2[0,1] = std_lasso
table_2[0,2] = lower_ci_lasso
table_2[0,3] = upper_ci_lasso
table_2[= pd.DataFrame( table_2, columns = [ "Estimator","Std. Error", "lower bound CI", "upper bound CI" ])
table_2_pandas = [ "LASSO" ]
table_2_pandas.index table_2_pandas
estimator | standard error | lower bound CI | upper bound CI | |
---|---|---|---|---|
Double Lasso | -0.050 | 0.014 | -0.078 | -0.022 |
Lasso provides a more precise estimate (lower standard error). The Lasso based point estimate is about \(5\%\) and the \(95\%\) confidence interval for the (annual) rate of convergence is \(7.8\%\) to \(2.2\%\). This empirical evidence does support the convergence hypothesis.
Note: Alternatively, one could also use the rlassoEffect funtion from the hdm package that directly applies the partialling-out approach.
= rlassoEffect(x = W, y = Y, d = D, method = "partialling out")
lasso.effect lasso.effect
print("no package in python")
8.2.3 Summary results
Finally, let us have a look at the results.
<- matrix(0, 2, 4)
table 1,1:4] <- c(est_ols,std_ols,ci_ols[1],ci_ols[2])
table[2,1:4] <- c(est_lasso,std_lasso,ci_lasso[1],ci_lasso[2])
table[colnames(table) <-c("estimator","standard error", "lower bound CI", "upper bound CI")
rownames(table) <-c("OLS","Double Lasso")
<- xtable(table, digits = 3)
tab# print(tab,type="html") # set type="latex" for printing table in LaTeX
table
## estimator standard error lower bound CI upper bound CI
## OLS -0.009377989 0.02988773 -0.07060022 0.05184424
## Double Lasso -0.049811465 0.01393636 -0.07750705 -0.02211588
= np.zeros( (1, 4) )
table_2 0,0] = est_lasso
table_2[0,1] = std_lasso
table_2[0,2] = lower_ci_lasso
table_2[0,3] = upper_ci_lasso
table_2[= pd.DataFrame( table_2, columns = [ "Estimator","Std. Error", "lower bound CI", "upper bound CI" ])
table_2_pandas = [ "Double LASSO" ]
table_2_pandas.index = table_1_pandas.append(table_2_pandas)
table_3 table_3
## Estimator Std. Error lower bound CI upper bound CI
## OLS -0.009378 0.029888 -0.070600 0.051844
## Double LASSO -0.047747 0.017705 -0.082926 -0.012567
The least square method provides a rather noisy estimate of the speed of convergence. We can not answer the question if poor countries grow faster than rich countries. The least square method does not work when the ratio \(p/n\) is large.
In sharp contrast, partialling-out via Lasso provides a more precise estimate. The Lasso based point estimate is \(-5\%\) and the \(95\%\) confidence interval for the (annual) rate of convergence \([-7.8\%,-2.2\%]\) only includes negative numbers. This empirical evidence does support the convergence hypothesis.
estimator | standard error | lower bound CI | upper bound CI | |
---|---|---|---|---|
OLS | -0.009 | 0.030 | -0.071 | 0.052 |
Double Lasso | -0.050 | 0.014 | -0.078 | -0.022 |