Olive
Olive

Reputation: 644

merge to replace nans of the same column in pandas dataframe?

I have the following dataframe to which I want to merge multiple dataframes to, this df consist of ID, date, and many other variables..

ID date     ..other variables...
A  2017Q1  
A  2017Q2  
A  2018Q1 
B  2017Q1  
B  2017Q2  
B  2017Q3  
C  2018Q1  
C  2018Q2
..   ..

And i have a bunch of dataframes (by quarter) that has asset holdings information

df_2017Q1:

ID  date   asset_holdings
A  2017Q1    1
B  2017Q1    2
C  2017Q1    4
...

df_2017Q2

ID  date   asset_holdings
A  2017Q2    2
B  2017Q2    5
C  2017Q2    4
...

df_2017Q3

ID  date   asset_holdings
A  2017Q3    1
B  2017Q3    2
C  2017Q3    10
...

df_2017Q4..

ID  date   asset_holdings
A  2017Q4    10
B  2017Q4    20
C  2017Q4    14
...

df_2018Q1..

ID  date   asset_holdings
A  2018Q1    11
B  2018Q1    23
C  2018Q1    15
...

df_2018Q2...

ID  date   asset_holdings
A  2018Q2    11
B  2018Q2    26
C  2018Q2    19
...

....

desired output

ID date   asset_holdings  ..other variables...
A  2017Q1   1
A  2017Q2   2
A  2018Q1   11
B  2017Q1   2
B  2017Q2   5
B  2017Q3   2
C  2018Q1   15
C  2018Q2   19
..   ..

I think merging on ID and date, should do it but this will create + n columns which I do not want, so I want to create a column "asset_holdings" and merge the right dfs while updating NAN values. But not sure if this is the smartest way. Any help will be appreciated!

Upvotes: 0

Views: 242

Answers (1)

mosc9575
mosc9575

Reputation: 6337

Try to use pd.concat() to concatenate your different DataFrames and then use sort_values(['ID', 'date']) to sort the values by the columns ID and date.

See the example below as demonstration.

import pandas as pd

df1 = pd.DataFrame({'ID':list('ABCD'), 'date':['2017Q1']*4, 'other':[1,2,3,4]})
df2 = pd.DataFrame({'ID':list('ABCD'), 'date':['2017Q2']*4, 'other':[4,3,2,1]})
df3 = pd.DataFrame({'ID':list('ABCD'), 'date':['2018Q1']*4, 'other':[7,6,5,4]})

ans = pd.concat([df1, df2, df3]).sort_values(['ID', 'date'], ignore_index=True)
>>> ans
   ID    date  other
0   A  2017Q1      1
1   A  2017Q2      4
2   A  2018Q1      7
3   B  2017Q1      2
4   B  2017Q2      3
5   B  2018Q1      6
6   C  2017Q1      3
7   C  2017Q2      2
8   C  2018Q1      5
9   D  2017Q1      4
10  D  2017Q2      1
11  D  2018Q1      4

Upvotes: 1

Related Questions