Murcielago
Murcielago

Reputation: 1005

Pandas: how to prevent df.append() from returning NaN values

I am trying to append the content of one dataframe into another. Here is basic example of what I am working with:

import numpy as np
import pandas as pd



df1 = pd.DataFrame({'Id': ['001','001','001','002','002','002','004','004'],
                    'Date':['2020-01-01','2020-01-02','2020-01-03','2020-01-01','2020-01-02','2020-01-03','2020-01-02','2020-01-03'],
                    'Quantity': [100,100,100,50,50,50,60,60],
                    'fx' :[1,1,1,2,2,2,1,1],
                    'fy' : [1,1,1,3,3,3,1,1]})

df2 = pd.DataFrame({'Id': ['001','001','001','002','002','002', '003'],
                    'Date':['2019-01-01','2019-01-02','2019-01-03','2019-01-01','2019-01-02','2019-01-03','2019-02-02'],
                    'Quantity': [100,100,100,50,50,50,20]})

Now I want to append the content of df2 into df1, but the issue is that it results in some NaN here and there in df1

histo = df1.append(df2)
histo = histo.sort_values('Id')
print(histo)

    Id        Date  Quantity   fx   fy
0  001  2020-01-01       100  1.0  1.0
1  001  2020-01-02       100  1.0  1.0
2  001  2020-01-03       100  1.0  1.0
0  001  2019-01-01       100  NaN  NaN
1  001  2019-01-02       100  NaN  NaN
2  001  2019-01-03       100  NaN  NaN
3  002  2020-01-01        50  2.0  3.0
4  002  2020-01-02        50  2.0  3.0
5  002  2020-01-03        50  2.0  3.0
3  002  2019-01-01        50  NaN  NaN
4  002  2019-01-02        50  NaN  NaN
5  002  2019-01-03        50  NaN  NaN
6  003  2019-02-02        20  NaN  NaN
6  004  2020-01-02        60  1.0  1.0
7  004  2020-01-03        60  1.0  1.0

the output that I want to achieve is that for each 'Id' row, the values of fx and fy continue being the same. the result would look like this:


    Id        Date  Quantity   fx   fy
0  001  2020-01-01       100  1.0  1.0
1  001  2020-01-02       100  1.0  1.0
2  001  2020-01-03       100  1.0  1.0
0  001  2019-01-01       100  1.0  1.0
1  001  2019-01-02       100  1.0  1.0
2  001  2019-01-03       100  1.0  1.0
3  002  2020-01-01        50  2.0  3.0
4  002  2020-01-02        50  2.0  3.0
5  002  2020-01-03        50  2.0  3.0
3  002  2019-01-01        50  2.0  3.0
4  002  2019-01-02        50  2.0  3.0
5  002  2019-01-03        50  2.0  3.0
6  003  2019-02-02        20  2.0  3.0
6  004  2020-01-02        60  1.0  1.0
7  004  2020-01-03        60  1.0  1.0

what can I do to achieve the above output? I cannot find it in pandas documentation. Thanks

Upvotes: 0

Views: 729

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Use ffill, forward fills NaN value with the last non-NaN value seen in a column.

histo = histo.sort_values('Id').ffill()

Upvotes: 1

Related Questions