Cedric
Cedric

Reputation: 79

replacing values in a pandas dataframe with values from another dataframe based common columns

How can I replace values in a pandas dataframe with values from another dataframe based common columns.

I need to replace NaN values in dataframe1 based on the common columns of "types" and "o_period". any suggestion?

df1

types  c_years  o_periods  s_months  incidents
0       1        1          1     127.0        0.0
1       1        1          2      63.0        0.0
2       1        2          1    1095.0        3.0
3       1        2          2    1095.0        4.0
4       1        3          1    1512.0        6.0
5       1        3          2    3353.0       18.0
6       1        4          1       NaN        NaN
7       1        4          2    2244.0       11.0
8       2        1          1   44882.0       39.0
9       2        1          2   17176.0       29.0
10      2        2          1   28609.0       58.0
11      2        2          2   20370.0       53.0
12      2        3          1    7064.0       12.0
13      2        3          2   13099.0       44.0
14      2        4          1       NaN        NaN
15      2        4          2    7117.0       18.0
16      3        1          1    1179.0        1.0
17      3        1          2     552.0        1.0
18      3        2          1     781.0        0.0
19      3        2          2     676.0        1.0
20      3        3          1     783.0        6.0
21      3        3          2    1948.0        2.0
22      3        4          1       NaN        NaN
23      3        4          2     274.0        1.0
24      4        1          1     251.0        0.0
25      4        1          2     105.0        0.0
26      4        2          1     288.0        0.0
27      4        2          2     192.0        0.0
28      4        3          1     349.0        2.0
29      4        3          2    1208.0       11.0
30      4        4          1       NaN        NaN
31      4        4          2    2051.0        4.0
32      5        1          1      45.0        0.0
33      5        1          2       NaN        NaN
34      5        2          1     789.0        7.0
35      5        2          2     437.0        7.0
36      5        3          1    1157.0        5.0
37      5        3          2    2161.0       12.0
38      5        4          1       NaN        NaN
39      5        4          2     542.0        1.0

df2

 types  o_periods  s_months  incidents
0      1          1     911.0        3.0
1      1          2    1689.0        8.0
2      2          1   26852.0       36.0
3      2          2   14440.0       36.0
4      3          1     914.0        2.0
5      3          2     862.0        1.0
6      4          1     296.0        1.0
7      4          2     889.0        4.0
8      5          1     664.0        4.0
9      5          2    1047.0        7.0

df3:rows with NaN

    types  c_years  o_periods  s_months  incidents
6       1        4          1       NaN        NaN
14      2        4          1       NaN        NaN
22      3        4          1       NaN        NaN
30      4        4          1       NaN        NaN
33      5        1          2       NaN        NaN
38      5        4          1       NaN        NaN

I have tried to merge df2 with df3 but the indexing seems to reset.

Upvotes: 1

Views: 874

Answers (1)

anarchy
anarchy

Reputation: 5174

First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.

Then do a left join based on the new dataframe.

df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])

After that is done, append the rows from df4 back into df1.

Another way is to combine the 2 columns you want to lookup into a single column

df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)

df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)

Then you can do a look up on the missing values.

df1.types_o.replace('Nan', np.NaN, inplace=True)

df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)

df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)

You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.

Upvotes: 1

Related Questions