Reputation: 1702
I have two dataframes I'm looking to concatenate. Dataframe b
is a subset of a
. Each has a datetimeindex, a
is in YYYY-MM-DD format, b
is in month-end YYYY-MM-DD format.
There are a number of ways (using some manner of pd.merge/join/concat
) to join the two by formatting or otherwise modifying the index to match, returning the intersection of the two, but I need the resulting df to be the intersection inclusive of dataframe a
's index.
Sample data below:
a
date ret
2006-03-24 -0.013630799008283678
2006-03-27 -0.007940256570502124
2006-03-28 -0.0014077900025659673
2006-03-29 0.00561930769215746
2006-03-30 0.01827696914620336
2006-03-31 0.0038897200398240184
2006-04-03 -0.0022862378499624025
2006-04-04 0.011605546120307841
2006-04-05 0.018826194747782576
2006-04-06 0.0037673174750787464
2006-04-07 -0.009780029053639616
2006-04-10 0.005346415051967202
b
date ret
2006-04-30 0.09429241029317115
2006-07-31 0.04774776287321525
2006-10-31 0.06057447262257265
2007-01-31 -0.02304839543629089
2007-02-28 0.022269124015597108
The result set (given the above) would be:
date ret
2006-04-03 -0.0022862378499624025
2006-04-04 0.011605546120307841
2006-04-05 0.018826194747782576
2006-04-06 0.0037673174750787464
2006-04-07 -0.009780029053639616
2006-04-10 0.005346415051967202
My endgoal is to identify the dates in a
that are in the combination months/years in the second dataframe, b
. If there's a better way to go about that aside from a merge/join/concat, I'm all ears.
Upvotes: 1
Views: 351
Reputation: 863741
You can convert DatetimeIndex
to month periods by DatetimeIndex.to_period
and then merge
:
a['per'] = a.index.to_period('m')
b['per'] = b.index.to_period('m')
df1 = a.reset_index().merge(b[['per']], on='per')
print (df1)
date ret per
0 2006-04-03 -0.002286 2006-04
1 2006-04-04 0.011606 2006-04
2 2006-04-05 0.018826 2006-04
3 2006-04-06 0.003767 2006-04
4 2006-04-07 -0.009780 2006-04
5 2006-04-10 0.005346 2006-04
Another idea with Index.isin
and boolean indexing
:
df2 = a[a.index.to_period('m').isin(b.index.to_period('m'))]
print (df2)
ret
date
2006-04-03 -0.002286
2006-04-04 0.011606
2006-04-05 0.018826
2006-04-06 0.003767
2006-04-07 -0.009780
2006-04-10 0.005346
Upvotes: 1