Reputation: 1005
I am trying to append the content of one dataframe into another. Here is basic example of what I am working with:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'Id': ['001','001','001','002','002','002','004','004'],
'Date':['2020-01-01','2020-01-02','2020-01-03','2020-01-01','2020-01-02','2020-01-03','2020-01-02','2020-01-03'],
'Quantity': [100,100,100,50,50,50,60,60],
'fx' :[1,1,1,2,2,2,1,1],
'fy' : [1,1,1,3,3,3,1,1]})
df2 = pd.DataFrame({'Id': ['001','001','001','002','002','002', '003'],
'Date':['2019-01-01','2019-01-02','2019-01-03','2019-01-01','2019-01-02','2019-01-03','2019-02-02'],
'Quantity': [100,100,100,50,50,50,20]})
Now I want to append the content of df2 into df1, but the issue is that it results in some NaN here and there in df1
histo = df1.append(df2)
histo = histo.sort_values('Id')
print(histo)
Id Date Quantity fx fy
0 001 2020-01-01 100 1.0 1.0
1 001 2020-01-02 100 1.0 1.0
2 001 2020-01-03 100 1.0 1.0
0 001 2019-01-01 100 NaN NaN
1 001 2019-01-02 100 NaN NaN
2 001 2019-01-03 100 NaN NaN
3 002 2020-01-01 50 2.0 3.0
4 002 2020-01-02 50 2.0 3.0
5 002 2020-01-03 50 2.0 3.0
3 002 2019-01-01 50 NaN NaN
4 002 2019-01-02 50 NaN NaN
5 002 2019-01-03 50 NaN NaN
6 003 2019-02-02 20 NaN NaN
6 004 2020-01-02 60 1.0 1.0
7 004 2020-01-03 60 1.0 1.0
the output that I want to achieve is that for each 'Id' row, the values of fx and fy continue being the same. the result would look like this:
Id Date Quantity fx fy
0 001 2020-01-01 100 1.0 1.0
1 001 2020-01-02 100 1.0 1.0
2 001 2020-01-03 100 1.0 1.0
0 001 2019-01-01 100 1.0 1.0
1 001 2019-01-02 100 1.0 1.0
2 001 2019-01-03 100 1.0 1.0
3 002 2020-01-01 50 2.0 3.0
4 002 2020-01-02 50 2.0 3.0
5 002 2020-01-03 50 2.0 3.0
3 002 2019-01-01 50 2.0 3.0
4 002 2019-01-02 50 2.0 3.0
5 002 2019-01-03 50 2.0 3.0
6 003 2019-02-02 20 2.0 3.0
6 004 2020-01-02 60 1.0 1.0
7 004 2020-01-03 60 1.0 1.0
what can I do to achieve the above output? I cannot find it in pandas documentation. Thanks
Upvotes: 0
Views: 729
Reputation: 153460
Use ffill, forward fills NaN value with the last non-NaN value seen in a column.
histo = histo.sort_values('Id').ffill()
Upvotes: 1