Reputation: 4607
Currently I have two dataframes that look like this:
FSample
GMSample
What I want is something that ideally looks like this:
I attempted to do something similar to
result = pd.concat([FSample,GMSample],axis=1)
result
But my result has the data stacked on top of each other.
Then I attempted to use the merge command like this
result = pd.merge(FSample,GMSample,how='inner',on='Date')
result
From that I got a KeyError on 'Date'
So I feel like I am missing both an understanding of how I should be trying to combine these dataframes (i.e. multi-index?) and the syntax to do so properly.
Upvotes: 0
Views: 5150
Reputation: 143
You get a key error, because the Date is an index, whereas the "on" keyword in merge takes a column. Alternatively, you could remove Symbol from the indexes and then join the dataframes by the Date indexes.
FSample.reset_index("Symbol").join(GMSample.reset_index("Symbol"), lsuffix="_x", rsuffix="_y")
Upvotes: 1
Reputation: 4463
Working with MultiIndexes in pandas usually requires you to constantly set/reset the index. That is probably going to be the easiest thing to do in this case as well, as pd.merge
does not immediately support merging on specific levels of a MultiIndex.
df_f = pd.DataFrame(
data = {
'Symbol': ['F'] * 5,
'Date': pd.to_datetime(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06', '2012-01-09']),
'Close': [11.13, 11.30, 11.59, 11.71, 11.80],
},
).set_index(['Symbol', 'Date']).sort_index()
df_gm = pd.DataFrame(
data = {
'Symbol': ['GM'] * 5,
'Date': pd.to_datetime(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06', '2012-01-09']),
'Close': [21.05, 21.15, 22.17, 22.92, 22.84],
},
).set_index(['Symbol', 'Date']).sort_index()
pd.merge(df_f.reset_index(level='Date'),
df_gm.reset_index(level='Date'),
how='inner',
on='Date',
suffixes=('_F', '_GM')
).set_index('Date')
The result:
Close_F Close_GM
Date
2012-01-03 11.13 21.05
2012-01-04 11.30 21.15
2012-01-05 11.59 22.17
2012-01-06 11.71 22.92
2012-01-09 11.80 22.84
Upvotes: 1