Reputation: 11
I am currently running a multivariate Fama-MacBeth regression (Fama and MacBeth, 1973) on a dataset containing the risk factors MRI, SMB and HML in this example. In my analysis, I need to exclude certain variables (in this example HML) before a certain time point (in this case 31-05-2023). The final result should combine the two regression results to get a single coefficient and t-value for each risk factor.
I believe that I have managed to run the regression and combine the results correctly; however, I am having difficulty calculating the t-values correctly. I have adjusted the standard errors according to Newey-West to account for autocorrelation and heteroskedasticity. However, I find it difficult to extract these Newey-West adjusted standard errors from the regression results and calculate the t-values on this basis.
In my current approach, the standard error is calculated based on the parameter values in the fm_summary function. However, the parameter values output by the regression are not Newey-West adjusted; only the standard errors are. This means that my t-values in the results, while fitting per se, are not Newey West adjusted.
I have also tried using the FamaMacBeth.from_formula method from the linearmodels library, which provides a Newey-West adjustment. However, with this approach I am having problems combining the results from the different time periods correctly.
I would appreciate any advice or guidance on how to proceed with this analysis. Thank you in advance for your help!
Here is a breakdown example of my current approach:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
# Create time periods
# pd.date_range generates a range of equally spaced dates
time_periods = pd.date_range(start='2023-01-01', periods=10, freq='M')
# Create 'permno' (stock identifier)
# np.repeat repeats elements of an array.
permno = np.repeat(np.arange(1, 11), len(time_periods))
# Generate 'RET' (Return) values
# np.random.uniform generates uniformly distributed random numbers
ret = np.random.uniform(low=0, high=1, size=len(permno))
# Generate risk factors
mrt = np.random.uniform(low=0, high=1, size=len(permno))
smb = np.random.uniform(low=0, high=1, size=len(permno))
hml = np.random.uniform(low=0, high=1, size=len(permno))
# Create dataframe
data = {
'permno': permno,
'DATE': np.repeat(time_periods, 10),
'RET': ret,
'MRT': mrt,
'SMB': smb,
'HML': hml
}
df = pd.DataFrame(data)
# Set multi-index for the dataframe
df.set_index(['permno', 'DATE'], inplace=True)
# Function to compute OLS coefficients
def ols_coef(x, formula):
# smf.ols fits an ordinary least squares linear regression model
# cov_type='HAC' specifies heteroskedasticity and autocorrelation consistent (HAC) covariance estimator
# cov_kwds={'maxlags': 12} sets maximum lags to 12 in kernel
params = smf.ols(formula, data=x).fit(cov_type='HAC', cov_kwds={'maxlags': 12}).params
return params * 100
# Function to compute summary statistics
def fm_summary(p):
s = p.describe().T
# Compute standard error as standard deviation divided by square root of number of observations
s['std_error'] = s['std'] / np.sqrt(s['count'])
# Compute t-statistic as mean divided by standard error
s['tstat'] = s['mean'] / s['std_error']
return s[['mean', 'tstat']]
# Specify regression formulas for periods A and B
formula_A = 'RET ~ MRT + SMB '
formula_B = 'RET ~ MRT + SMB + HML'
# Subset data for periods A and B based on DATE
timeperiod_A = df[df.index.get_level_values('DATE') < pd.Timestamp('2023-05-31')]
timeperiod_B = df[df.index.get_level_values('DATE') >= pd.Timestamp('2023-05-31')]
# Apply ols_coef function to each group of data (grouped by DATE)
gammaA = timeperiod_A.groupby('DATE').apply(ols_coef, formula_A)
gammaB = timeperiod_B.groupby('DATE').apply(ols_coef, formula_B)
# Concatenate the two sets of results and sort by index
gamma = pd.concat([gammaA, gammaB]).sort_index()
# Compute summary statistics for the coefficients
summary = fm_summary(gamma)
# Print the summary statistics
print(summary)
I have previously attempted to perform the regressions using the linearmodels library. Although this method does allow for Newey-West adjustment, I've had difficulties in combining the results from the different periods. Furthermore, I am unsure of the correct procedure to weigh the results for each period.
In the provided example, I extracted the standard errors using the mod.bse method and computed the mean over all periods. I then approximated the t-values using the formula tvalue = p.mean() / bse.mean(). Unfortunately, this approach does not yield the correct t-values.
Upvotes: 1
Views: 462