Reputation: 309
I have two dataframes, df1, df2.
df1 has a column 'date', that is a datetime. df2 is indexed by a periodindex (periods of time, instead of timestamps).
I wish I could somehow join the two dataframes so that, for each row of df1, I will add all the columns of df2 for the row (in df2) selected by the periodindex where the df1['date'] belongs.
I hope the question is clear enough. I guess I can manually do some loop to do that, but if there is a more efficient way to do this, using some pandas method, I will appreciate it a lot.
Thanks in advance!
Here are the first columns of df2, the data frame indexed by a periodindex called 'cuatrimestre':
dif_precio_vivienda dif_rendimiento_ibex \
cuatrimestre
1995Q1 NaN NaN
1995Q2 0.000000 -2.940238
1995Q3 0.007233 -0.500118
1995Q4 0.026514 0.535169
1996Q1 -0.009417 -0.171129
And here the first columns of df1:
fecha municipioid latitud longitud numbanyo numdormitorio \
25138 2014-02-12 4353 0.705444 -0.064720 1.0 1.0
25144 2014-05-06 4353 0.705444 -0.064720 1.0 1.0
25185 2014-01-02 4353 0.705349 -0.064618 1.0 1.0
25186 2014-02-12 4353 0.705353 -0.064620 1.0 1.0
25201 2014-07-07 4353 0.705314 -0.064610 1.0 3.0
You can see that the index is diferent, but it has a datetime column ('fecha'), and I wanted to join on this column over the periodindex...
Upvotes: 1
Views: 915
Reputation: 862641
I think you can first create columns year
and quarter
in both DataFrames
by dt.year
and quarter
in df1
and str.split
with converting to int
by astype
in df2
. Then you can merge
them by columns year
and quarter
:
#if type of column fecha is to datetime, convert it
df1['fecha'] = pd.to_datetime(df1['fecha'])
df1['year'] = df1['fecha'].dt.year
df1['quarter'] = df1['fecha'].dt.quarter
print df1
fecha municipioid latitud longitud numbanyo numdormitorio \
25138 2014-02-12 4353 0.705444 -0.064720 1.0 1.0
25144 2014-05-06 4353 0.705444 -0.064720 1.0 1.0
25185 2014-01-02 4353 0.705349 -0.064618 1.0 1.0
25186 2014-02-12 4353 0.705353 -0.064620 1.0 1.0
25201 2014-07-07 4353 0.705314 -0.064610 1.0 3.0
year quarter
25138 2014 1
25144 2014 2
25185 2014 1
25186 2014 1
25201 2014 3
df2 = df2.reset_index()
df2[['year','quarter']] = df2['cuatrimestre'].str.split('Q', expand=True)
df2['year'] = df2['year'].astype(int)
df2['quarter'] = df2['quarter'].astype(int)
print df2
cuatrimestre dif_precio_vivienda dif_rendimiento_ibex year quarter
0 2014Q1 NaN NaN 2014 1
1 2014Q2 0.000000 -2.940238 2014 2
2 1995Q3 0.007233 -0.500118 1995 3
3 1995Q4 0.026514 0.535169 1995 4
4 1996Q1 -0.009417 -0.171129 1996 1
print pd.merge(df1,df2, on=['year','quarter'], how='left')
fecha municipioid latitud longitud numbanyo numdormitorio year \
0 2014-02-12 4353 0.705444 -0.064720 1.0 1.0 2014
1 2014-05-06 4353 0.705444 -0.064720 1.0 1.0 2014
2 2014-01-02 4353 0.705349 -0.064618 1.0 1.0 2014
3 2014-02-12 4353 0.705353 -0.064620 1.0 1.0 2014
4 2014-07-07 4353 0.705314 -0.064610 1.0 3.0 2014
quarter cuatrimestre dif_precio_vivienda dif_rendimiento_ibex
0 1 2014Q1 NaN NaN
1 2 2014Q2 0.0 -2.940238
2 1 2014Q1 NaN NaN
3 1 2014Q1 NaN NaN
4 3 NaN NaN NaN
Upvotes: 1