Korvex91
Korvex91

Reputation: 11

Calculate Newey West adjusted t-statistic with statsmodel libary after FamaMac-Beth regression

I am currently running a multivariate Fama-MacBeth regression (Fama and MacBeth, 1973) on a dataset containing the risk factors MRI, SMB and HML in this example. In my analysis, I need to exclude certain variables (in this example HML) before a certain time point (in this case 31-05-2023). The final result should combine the two regression results to get a single coefficient and t-value for each risk factor.

I believe that I have managed to run the regression and combine the results correctly; however, I am having difficulty calculating the t-values correctly. I have adjusted the standard errors according to Newey-West to account for autocorrelation and heteroskedasticity. However, I find it difficult to extract these Newey-West adjusted standard errors from the regression results and calculate the t-values on this basis.

In my current approach, the standard error is calculated based on the parameter values in the fm_summary function. However, the parameter values output by the regression are not Newey-West adjusted; only the standard errors are. This means that my t-values in the results, while fitting per se, are not Newey West adjusted.

I have also tried using the FamaMacBeth.from_formula method from the linearmodels library, which provides a Newey-West adjustment. However, with this approach I am having problems combining the results from the different time periods correctly.

I would appreciate any advice or guidance on how to proceed with this analysis. Thank you in advance for your help!

Here is a breakdown example of my current approach:

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

# Create time periods
# pd.date_range generates a range of equally spaced dates
time_periods = pd.date_range(start='2023-01-01', periods=10, freq='M')

# Create 'permno' (stock identifier) 
# np.repeat repeats elements of an array.
permno = np.repeat(np.arange(1, 11), len(time_periods))

# Generate 'RET' (Return) values
# np.random.uniform generates uniformly distributed random numbers
ret = np.random.uniform(low=0, high=1, size=len(permno))

# Generate risk factors
mrt = np.random.uniform(low=0, high=1, size=len(permno))
smb = np.random.uniform(low=0, high=1, size=len(permno))
hml = np.random.uniform(low=0, high=1, size=len(permno))

# Create dataframe
data = {
    'permno': permno,
    'DATE': np.repeat(time_periods, 10),
    'RET': ret,
    'MRT': mrt,
    'SMB': smb,
    'HML': hml
}

df = pd.DataFrame(data)

# Set multi-index for the dataframe
df.set_index(['permno', 'DATE'], inplace=True)

# Function to compute OLS coefficients
def ols_coef(x, formula):
    # smf.ols fits an ordinary least squares linear regression model
    # cov_type='HAC' specifies heteroskedasticity and autocorrelation consistent (HAC) covariance estimator
    # cov_kwds={'maxlags': 12} sets maximum lags to 12 in kernel
    params = smf.ols(formula, data=x).fit(cov_type='HAC', cov_kwds={'maxlags': 12}).params
    return params * 100  

# Function to compute summary statistics
def fm_summary(p):
    s = p.describe().T
    # Compute standard error as standard deviation divided by square root of number of observations
    s['std_error'] = s['std'] / np.sqrt(s['count'])
    # Compute t-statistic as mean divided by standard error
    s['tstat'] = s['mean'] / s['std_error']
    return s[['mean', 'tstat']]

# Specify regression formulas for periods A and B
formula_A = 'RET ~ MRT + SMB '
formula_B = 'RET ~ MRT + SMB + HML'

# Subset data for periods A and B based on DATE
timeperiod_A = df[df.index.get_level_values('DATE') < pd.Timestamp('2023-05-31')]
timeperiod_B = df[df.index.get_level_values('DATE') >= pd.Timestamp('2023-05-31')]

# Apply ols_coef function to each group of data (grouped by DATE)
gammaA = timeperiod_A.groupby('DATE').apply(ols_coef, formula_A)
gammaB = timeperiod_B.groupby('DATE').apply(ols_coef, formula_B)

# Concatenate the two sets of results and sort by index
gamma = pd.concat([gammaA, gammaB]).sort_index()

# Compute summary statistics for the coefficients
summary = fm_summary(gamma)

# Print the summary statistics
print(summary)

I have previously attempted to perform the regressions using the linearmodels library. Although this method does allow for Newey-West adjustment, I've had difficulties in combining the results from the different periods. Furthermore, I am unsure of the correct procedure to weigh the results for each period.

In the provided example, I extracted the standard errors using the mod.bse method and computed the mean over all periods. I then approximated the t-values using the formula tvalue = p.mean() / bse.mean(). Unfortunately, this approach does not yield the correct t-values.

Upvotes: 1

Views: 462

Answers (0)

Related Questions