Joseph U.
Joseph U.

Reputation: 4607

How do I merge two pandas dataframes by time series index?

Currently I have two dataframes that look like this:

FSample

enter image description here

GMSample

enter image description here

What I want is something that ideally looks like this:

enter image description here

I attempted to do something similar to

result = pd.concat([FSample,GMSample],axis=1)
result

But my result has the data stacked on top of each other. enter image description here

Then I attempted to use the merge command like this

result = pd.merge(FSample,GMSample,how='inner',on='Date')
result

From that I got a KeyError on 'Date'

So I feel like I am missing both an understanding of how I should be trying to combine these dataframes (i.e. multi-index?) and the syntax to do so properly.

Upvotes: 0

Views: 5150

Answers (2)

knirb
knirb

Reputation: 143

You get a key error, because the Date is an index, whereas the "on" keyword in merge takes a column. Alternatively, you could remove Symbol from the indexes and then join the dataframes by the Date indexes.

FSample.reset_index("Symbol").join(GMSample.reset_index("Symbol"), lsuffix="_x", rsuffix="_y")

Upvotes: 1

Micah Smith
Micah Smith

Reputation: 4463

Working with MultiIndexes in pandas usually requires you to constantly set/reset the index. That is probably going to be the easiest thing to do in this case as well, as pd.merge does not immediately support merging on specific levels of a MultiIndex.

df_f = pd.DataFrame(
    data = {
        'Symbol': ['F'] * 5,
        'Date': pd.to_datetime(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06', '2012-01-09']),
        'Close': [11.13, 11.30, 11.59, 11.71, 11.80],
    },
).set_index(['Symbol', 'Date']).sort_index()
df_gm = pd.DataFrame(
    data = {
        'Symbol': ['GM'] * 5,
        'Date': pd.to_datetime(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06', '2012-01-09']),
        'Close': [21.05, 21.15, 22.17, 22.92, 22.84],
    },
).set_index(['Symbol', 'Date']).sort_index()

pd.merge(df_f.reset_index(level='Date'),
         df_gm.reset_index(level='Date'),
         how='inner',
         on='Date',
         suffixes=('_F', '_GM')
).set_index('Date')

The result:

    Close_F     Close_GM
Date        
2012-01-03  11.13   21.05
2012-01-04  11.30   21.15
2012-01-05  11.59   22.17
2012-01-06  11.71   22.92
2012-01-09  11.80   22.84

Upvotes: 1

Related Questions