Chris
Chris

Reputation: 1702

Concatenate two dataframes on datetimeindex of different format

I have two dataframes I'm looking to concatenate. Dataframe b is a subset of a. Each has a datetimeindex, a is in YYYY-MM-DD format, b is in month-end YYYY-MM-DD format.

There are a number of ways (using some manner of pd.merge/join/concat) to join the two by formatting or otherwise modifying the index to match, returning the intersection of the two, but I need the resulting df to be the intersection inclusive of dataframe a's index.

Sample data below:

a

date    ret
2006-03-24  -0.013630799008283678
2006-03-27  -0.007940256570502124
2006-03-28  -0.0014077900025659673
2006-03-29  0.00561930769215746
2006-03-30  0.01827696914620336
2006-03-31  0.0038897200398240184
2006-04-03  -0.0022862378499624025
2006-04-04  0.011605546120307841
2006-04-05  0.018826194747782576
2006-04-06  0.0037673174750787464
2006-04-07  -0.009780029053639616
2006-04-10  0.005346415051967202

b

date    ret
2006-04-30  0.09429241029317115
2006-07-31  0.04774776287321525
2006-10-31  0.06057447262257265
2007-01-31  -0.02304839543629089
2007-02-28  0.022269124015597108

The result set (given the above) would be:

date         ret
2006-04-03  -0.0022862378499624025
2006-04-04  0.011605546120307841
2006-04-05  0.018826194747782576
2006-04-06  0.0037673174750787464
2006-04-07  -0.009780029053639616
2006-04-10  0.005346415051967202

My endgoal is to identify the dates in a that are in the combination months/years in the second dataframe, b. If there's a better way to go about that aside from a merge/join/concat, I'm all ears.

Upvotes: 1

Views: 351

Answers (1)

jezrael
jezrael

Reputation: 863741

You can convert DatetimeIndex to month periods by DatetimeIndex.to_period and then merge:

a['per'] = a.index.to_period('m')
b['per'] = b.index.to_period('m')

df1 = a.reset_index().merge(b[['per']], on='per')
print (df1)
        date       ret      per
0 2006-04-03 -0.002286  2006-04
1 2006-04-04  0.011606  2006-04
2 2006-04-05  0.018826  2006-04
3 2006-04-06  0.003767  2006-04
4 2006-04-07 -0.009780  2006-04
5 2006-04-10  0.005346  2006-04

Another idea with Index.isin and boolean indexing:

df2 = a[a.index.to_period('m').isin(b.index.to_period('m'))]
print (df2)
                 ret
date                
2006-04-03 -0.002286
2006-04-04  0.011606
2006-04-05  0.018826
2006-04-06  0.003767
2006-04-07 -0.009780
2006-04-10  0.005346

Upvotes: 1

Related Questions