CSBossmann
CSBossmann

Reputation: 201

Panel regression gives error "exog does not have full column rank"

I am trying to estimate a panel regression (see: https://bashtage.github.io/linearmodels/doc/panel/examples/examples.html)

My data is formatted like that (thats just an example snippet; in the orginal file there are 11 columns plus the timestamp and thousands of rows):

What I have

Timestamp   Country Dummy   Pre Post    All_Countries   Timestamp
1993-11-01  1               0   1       6.18    1993-11-01
1993-11-02  1               0   1       6.18    1993-11-02
1993-11-03  1               0   1       6.17    1993-11-03
1993-11-04  1               1   0       6.17    1993-11-04
1993-11-15  1               1   0       6.40    1993-11-15
1993-11-01  2               0   1       7.05    1993-11-01
1993-11-02  2               0   1       7.05    1993-11-02
1993-11-03  2               0   1       7.20    1993-11-03
1993-11-04  2               1   0       7.50    1993-11-04
1993-11-15  2               1   0       7.60    1993-11-15
1993-11-01  3               0   1       7.69    1993-11-01
1993-11-02  3               0   1       7.61    1993-11-02
1993-11-03  3               0   1       7.67    1993-11-03
1993-11-04  3               1   0       7.91    1993-11-04
1993-11-15  3               1   0       8.61    1993-11-15

How you can re-create it

import numpy as np
import pandas as pd
df = pd.DataFrame({"Timestamp" : ['1993-11-01' ,'1993-11-02', '1993-11-03', '1993-11-04','1993-11-15'], "Pre" : [0 ,0, 0, 1, 1], "Post" : [1 ,1, 1, 0, 0],  "Austria" : [6.18 ,6.18, 6.17, 6.17, 6.40],"Belgium" : [7.05, 7.05, 7.2, 7.5, 7.6],"France" : [7.69, 7.61, 7.67, 7.91, 8.61]},index = [1, 2, 3,4,5])
df


 index_data = df.melt(['Timestamp','Pre','Post'], var_name='Country Dummy', value_name='All_Countries')

index_data['Country Dummy'] = index_data['Country Dummy'].factorize()[0] + 1
                     # pd.Categorical(out['Country Dummy']).codes + 1
timestamp = pd.Categorical(index_data['Timestamp'])
index_data = index_data.set_index(['Timestamp', 'Country Dummy'])
index_data['Timestamp'] = timestamp
index_data

**What I do **

!pip install linearmodels
from linearmodels.panel import PooledOLS
import statsmodels.api as sm
exog_vars = ['Pre','Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()
print(pooled_res)

**What I get **

"ValueError: exog does not have full column rank."

Question

Anyone an idea what could cause that problem?

Idea

Is it because my data should be formatted like that (see example in link at the top): --> and if yes, how could I get that

Timestamp   Country Dummy   Pre Post    All_Countries   Timestamp
1993-11-01  1               0   1       6.18    1993-11-01
1993-11-02                  0   1       6.18    1993-11-02
1993-11-03                  0   1       6.17    1993-11-03
1993-11-04                  1   0       6.17    1993-11-04
1993-11-15                  1   0       6.40    1993-11-15
1993-11-01  2               0   1       7.05    1993-11-01
1993-11-02                  0   1       7.05    1993-11-02
1993-11-03                  0   1       7.20    1993-11-03
1993-11-04                  1   0       7.50    1993-11-04
1993-11-15                  1   0       7.60    1993-11-15
1993-11-01  3               0   1       7.69    1993-11-01
1993-11-02                  0   1       7.61    1993-11-02
1993-11-03                  0   1       7.67    1993-11-03
1993-11-04                  1   0       7.91    1993-11-04
1993-11-15                  1   0       8.61    1993-11-15

Upvotes: 2

Views: 4540

Answers (1)

Juan C
Juan C

Reputation: 6132

That error is being raised because Pre is a linear combination of Post. You should only use one of those columns because the other doesn't add information (and breaks the algebra behind your model). In this case:

Pre = 1 - Post

This is the same reason you drop one dummy that will serve as a baseline when running an OLS model.

This should work:

exog_vars = ['Post']
exog = sm.add_constant(index_data[exog_vars])
mod = PooledOLS(index_data.All_Countries, exog)
pooled_res = mod.fit()

Upvotes: 3

Related Questions