In Python Pandas, how to join on a datetime column another dataframe that is indexed by a periodindex?

I have two dataframes, df1, df2.

df1 has a column 'date', that is a datetime. df2 is indexed by a periodindex (periods of time, instead of timestamps).

I wish I could somehow join the two dataframes so that, for each row of df1, I will add all the columns of df2 for the row (in df2) selected by the periodindex where the df1['date'] belongs.

I hope the question is clear enough. I guess I can manually do some loop to do that, but if there is a more efficient way to do this, using some pandas method, I will appreciate it a lot.

Thanks in advance!

Here are the first columns of df2, the data frame indexed by a periodindex called 'cuatrimestre':

                  dif_precio_vivienda  dif_rendimiento_ibex  \
cuatrimestre                                              
1995Q1                        NaN                   NaN   
1995Q2                   0.000000             -2.940238   
1995Q3                   0.007233             -0.500118   
1995Q4                   0.026514              0.535169   
1996Q1                  -0.009417             -0.171129 

And here the first columns of df1:

    fecha  municipioid   latitud  longitud  numbanyo  numdormitorio  \
25138  2014-02-12         4353  0.705444 -0.064720       1.0            1.0   
25144  2014-05-06         4353  0.705444 -0.064720       1.0            1.0   
25185  2014-01-02         4353  0.705349 -0.064618       1.0            1.0   
25186  2014-02-12         4353  0.705353 -0.064620       1.0            1.0   
25201  2014-07-07         4353  0.705314 -0.064610       1.0            3.0   

You can see that the index is diferent, but it has a datetime column ('fecha'), and I wanted to join on this column over the periodindex...

Upvotes: 1

Views: 915

Answers (1)

jezrael
jezrael

Reputation: 862641

I think you can first create columns year and quarter in both DataFrames by dt.year and quarter in df1 and str.split with converting to int by astype in df2. Then you can merge them by columns year and quarter:

#if type of column fecha is to datetime, convert it
df1['fecha'] = pd.to_datetime(df1['fecha'])
df1['year'] = df1['fecha'].dt.year
df1['quarter'] = df1['fecha'].dt.quarter
print df1
           fecha  municipioid   latitud  longitud  numbanyo  numdormitorio  \
25138 2014-02-12         4353  0.705444 -0.064720       1.0            1.0   
25144 2014-05-06         4353  0.705444 -0.064720       1.0            1.0   
25185 2014-01-02         4353  0.705349 -0.064618       1.0            1.0   
25186 2014-02-12         4353  0.705353 -0.064620       1.0            1.0   
25201 2014-07-07         4353  0.705314 -0.064610       1.0            3.0   

       year  quarter  
25138  2014        1  
25144  2014        2  
25185  2014        1  
25186  2014        1  
25201  2014        3  

df2 = df2.reset_index()
df2[['year','quarter']] = df2['cuatrimestre'].str.split('Q', expand=True)
df2['year'] = df2['year'].astype(int)
df2['quarter'] = df2['quarter'].astype(int)
print df2
  cuatrimestre  dif_precio_vivienda  dif_rendimiento_ibex  year  quarter
0       2014Q1                  NaN                   NaN  2014        1
1       2014Q2             0.000000             -2.940238  2014        2
2       1995Q3             0.007233             -0.500118  1995        3
3       1995Q4             0.026514              0.535169  1995        4
4       1996Q1            -0.009417             -0.171129  1996        1
print pd.merge(df1,df2, on=['year','quarter'], how='left')
       fecha  municipioid   latitud  longitud  numbanyo  numdormitorio  year  \
0 2014-02-12         4353  0.705444 -0.064720       1.0            1.0  2014   
1 2014-05-06         4353  0.705444 -0.064720       1.0            1.0  2014   
2 2014-01-02         4353  0.705349 -0.064618       1.0            1.0  2014   
3 2014-02-12         4353  0.705353 -0.064620       1.0            1.0  2014   
4 2014-07-07         4353  0.705314 -0.064610       1.0            3.0  2014   

   quarter cuatrimestre  dif_precio_vivienda  dif_rendimiento_ibex  
0        1       2014Q1                  NaN                   NaN  
1        2       2014Q2                  0.0             -2.940238  
2        1       2014Q1                  NaN                   NaN  
3        1       2014Q1                  NaN                   NaN  
4        3          NaN                  NaN                   NaN  

Upvotes: 1

Related Questions