Reputation: 644
I have the following dataframe to which I want to merge multiple dataframes to, this df consist of ID, date, and many other variables..
ID date ..other variables...
A 2017Q1
A 2017Q2
A 2018Q1
B 2017Q1
B 2017Q2
B 2017Q3
C 2018Q1
C 2018Q2
.. ..
And i have a bunch of dataframes (by quarter) that has asset holdings information
df_2017Q1:
ID date asset_holdings
A 2017Q1 1
B 2017Q1 2
C 2017Q1 4
...
df_2017Q2
ID date asset_holdings
A 2017Q2 2
B 2017Q2 5
C 2017Q2 4
...
df_2017Q3
ID date asset_holdings
A 2017Q3 1
B 2017Q3 2
C 2017Q3 10
...
df_2017Q4..
ID date asset_holdings
A 2017Q4 10
B 2017Q4 20
C 2017Q4 14
...
df_2018Q1..
ID date asset_holdings
A 2018Q1 11
B 2018Q1 23
C 2018Q1 15
...
df_2018Q2...
ID date asset_holdings
A 2018Q2 11
B 2018Q2 26
C 2018Q2 19
...
....
desired output
ID date asset_holdings ..other variables...
A 2017Q1 1
A 2017Q2 2
A 2018Q1 11
B 2017Q1 2
B 2017Q2 5
B 2017Q3 2
C 2018Q1 15
C 2018Q2 19
.. ..
I think merging on ID and date, should do it but this will create + n columns which I do not want, so I want to create a column "asset_holdings" and merge the right dfs while updating NAN values. But not sure if this is the smartest way. Any help will be appreciated!
Upvotes: 0
Views: 242
Reputation: 6337
Try to use pd.concat()
to concatenate your different DataFrames and then use sort_values(['ID', 'date'])
to sort the values by the columns ID
and date
.
See the example below as demonstration.
import pandas as pd
df1 = pd.DataFrame({'ID':list('ABCD'), 'date':['2017Q1']*4, 'other':[1,2,3,4]})
df2 = pd.DataFrame({'ID':list('ABCD'), 'date':['2017Q2']*4, 'other':[4,3,2,1]})
df3 = pd.DataFrame({'ID':list('ABCD'), 'date':['2018Q1']*4, 'other':[7,6,5,4]})
ans = pd.concat([df1, df2, df3]).sort_values(['ID', 'date'], ignore_index=True)
>>> ans
ID date other
0 A 2017Q1 1
1 A 2017Q2 4
2 A 2018Q1 7
3 B 2017Q1 2
4 B 2017Q2 3
5 B 2018Q1 6
6 C 2017Q1 3
7 C 2017Q2 2
8 C 2018Q1 5
9 D 2017Q1 4
10 D 2017Q2 1
11 D 2018Q1 4
Upvotes: 1