Nils
Nils

Reputation: 930

Setting subset of a pandas DataFrame by a DataFrame if a value matches

I think the easiest way to explain what I am trying to do is by showing an example:

Given a DataFrame

        V_set   V_reset     I_set   I_reset           HRS          LRS      ID
0     0.599417 -0.658417  0.000021 -0.000606  84562.252849  1097.226787  1383.0
1     0.595250 -0.684708  0.000023 -0.000617  43234.544776  1144.445368  1384.0
2     0.621229 -0.710812  0.000026 -0.000625  51719.718749  1216.609759  1385.0
3     0.625292 -0.720104  0.000029 -0.000625  40827.993527  1209.966052  1386.0
4     0.634563 -0.735937  0.000029 -0.000641  46881.785573  1219.497465  1387.0
       ...       ...       ...       ...           ...          ...     ...
1066  0.167521  0.000000  0.000581  0.000000    720.116614   708.098519  2811.0
1067  0.167360  0.000000  0.000581  0.000000    718.165882   708.284487  2812.0
1068  0.172812  0.000000  0.000278  0.000000    715.302620   708.167571  2813.0
1069  0.167729  0.000000  0.000581  0.000000    716.096291   708.333064  2814.0
1070  0.173037  0.000000  0.000278  0.000000    715.474310   707.980273  2815.0

and a subset of a second DataFrame df.loc[(df['HRS'].isnull()) & (df['wfm']=='shr'), ['HRS','LRS','V_set','V_reset','I_set','I_reset', 'ID']]:

        V_set   V_reset     I_set   I_reset           HRS          LRS      ID
1383       NaN       NaN       NaN       NaN           NaN          NaN  1383.0     
1384       NaN       NaN       NaN       NaN           NaN          NaN  1384.0 
1385       NaN       NaN       NaN       NaN           NaN          NaN  1385.0
1386       NaN       NaN       NaN       NaN           NaN          NaN  1386.0
1387       NaN       NaN       NaN       NaN           NaN          NaN  1387.0
       ...       ...       ...       ...           ...          ...     ...
2811       NaN       NaN       NaN       NaN           NaN          NaN  2811.0
2812       NaN       NaN       NaN       NaN           NaN          NaN  2812.0
2813       NaN       NaN       NaN       NaN           NaN          NaN  2813.0
2814       NaN       NaN       NaN       NaN           NaN          NaN  2814.0
2815       NaN       NaN       NaN       NaN           NaN          NaN  2815.0

I want to replace the NaN's from the second DataFrame by the first, BUT at the place where the ID matches, as I am not sure that the selected data will always be in the same order or if all IDs will be included.

I know I could do it with a for and if loop, but I am wondering if there is a faster way. If an ID form the second DataFrame is not included in the first DataFrame the values should just stay as NaN's.

Any help is highly appreciated.

Upvotes: 1

Views: 43

Answers (1)

Umar.H
Umar.H

Reputation: 23099

IIUC,

you have common column names and want to replace NaN values with values from your first df.

here's a solution using map and fillna this will work if your ID's have a 1 to 1 relationship.

df.set_index('ID',inplace=True)
for column in df.columns:
    df2[column] = df2[column].fillna(df2['ID'].map(df[column]))

print(df2)


         V_set    V_reset     I_set    I_reset           HRS          LRS  \
1383  0.599417  -0.658417  0.000021  -0.000606  84562.252849  1097.226787   
1384  0.595250  -0.684708  0.000023  -0.000617  43234.544776  1144.445368   
1385  0.621229  -0.710812  0.000026  -0.000625  51719.718749  1216.609759   
1386  0.625292  -0.720104  0.000029  -0.000625  40827.993527  1209.966052   
1387  0.634563  -0.735937  0.000029  -0.000641  46881.785573  1219.497465   
...        ...        ...       ...        ...           ...          ...   
2811  0.167521   0.000000  0.000581   0.000000    720.116614   708.098519   
2812  0.167360   0.000000  0.000581   0.000000    718.165882   708.284487   
2813  0.172812   0.000000  0.000278   0.000000    715.302620   708.167571   
2814  0.167729   0.000000  0.000581   0.000000    716.096291   708.333064   
2815  0.173037   0.000000  0.000278   0.000000    715.474310   707.980273   

          ID  
1383  1383.0  
1384  1384.0  
1385  1385.0  
1386  1386.0  
1387  1387.0  
...      NaN  
2811  2811.0  
2812  2812.0  
2813  2813.0  
2814  2814.0  
2815  2815.0  

if you want to fill the entire dataframe and your keys are unique - you can set both ID's as the index and use .fillna

df2.set_index('ID').fillna(df.set_index('ID'))
print(df2)
           V_set    V_reset     I_set    I_reset           HRS          LRS
ID                                                                         
1383.0  0.599417  -0.658417  0.000021  -0.000606  84562.252849  1097.226787
1384.0  0.595250  -0.684708  0.000023  -0.000617  43234.544776  1144.445368
1385.0  0.621229  -0.710812  0.000026  -0.000625  51719.718749  1216.609759
1386.0  0.625292  -0.720104  0.000029  -0.000625  40827.993527  1209.966052
1387.0  0.634563  -0.735937  0.000029  -0.000641  46881.785573  1219.497465
NaN          ...        ...       ...        ...           ...          ...
2811.0  0.167521   0.000000  0.000581   0.000000    720.116614   708.098519
2812.0  0.167360   0.000000  0.000581   0.000000    718.165882   708.284487
2813.0  0.172812   0.000000  0.000278   0.000000    715.302620   708.167571
2814.0  0.167729   0.000000  0.000581   0.000000    716.096291   708.333064
2815.0  0.173037   0.000000  0.000278   0.000000    715.474310   707.980273

Upvotes: 2

Related Questions