jmb277
jmb277

Reputation: 568

Fill NaN's within 1 column of a df via lookup to another df via pandas

I seen various versions of this question but none of them seem to fit with what I am attempting to do: here's my data:

Here's the df with the NaNs:

df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
               "B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
               "C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})

       A      B      C
0  10023      ,  small
1  10040  17,-6  large
2    NaN  19,-2  large
3  12345  17,-5  small
4    NaN  37,-5  small
5    NaN      ,  large
6  10033  9,-10  small
7    NaN  19,-2  small
8    NaN   2,-5  large

Next I have the lookup df called df2:

df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
                "A": ["10040", "54321", "12345", "10033"]})

       B      A
0  17,-5  10040
1  19,-2  54321
2  37,-5  12345
3  9,-10  10033

I would like to fill in the NaNs of column A on df by looking up column df2.B and returning df2.A such that the resulting dfr looks like this:

       A      B      C
0  10023      ,  small
1  10040  17,-6  large
2  54321  19,-2  large
3  10040  17,-5  small
4  12345  37,-5  small
5    NaN      ,  large
6  10033  9,-10  small
7  54321  19,-2  small
8    NaN   2,-5  large

Important caveats:

  1. The dfs do not have matching indexes
  2. The contents of df.A and df2.A are non-unique()
  3. The rows of df2 do make up unique pairs.
  4. Assume that there are more columns, not shown, with NaNs.

Using pandas, the rows of interest on df would be found (I think) via: df.loc[df['A'].isnull(),]. This answer seemed promising but I'm unclear where df1 in that example comes from. My actual data set is much, much larger than this and I'll have to be doing replacing several columns in this way.

Upvotes: 0

Views: 959

Answers (2)

Erfan
Erfan

Reputation: 42916

The map method from Wen-Ben will be faster in terms of speed, but here's another way you can solve this problem, just for your convenience and knowledge

You can use pd.merge, because this is basically a join problem. After the merge, we fillna and drop the columns we dont need.

df_final = pd.merge(df, df2, on='B', how='left', suffixes=['_1','_2'])
df_final['A'] = df_final.A_1.fillna(df_final.A_2)
df_final.drop(['A_1', 'A_2'], axis=1, inplace=True)

print(df_final)
       B      C      A
0      ,  small  10023
1  17,-6  large  10040
2  19,-2  large  54321
3  17,-5  small  12345
4  37,-5  small  12345
5      ,  large    NaN
6  9,-10  small  10033
7  19,-2  small  54321
8   2,-5  large    NaN

Upvotes: 1

BENY
BENY

Reputation: 323326

Just using np.where

df.A=np.where(df.A.isnull(),df.B.map(df2.set_index('B').A),df.A)
df
Out[149]: 
       A      B      C
0  10023      ,  small
1  10040  17,-6  large
2  54321  19,-2  large
3  12345  17,-5  small
4  12345  37,-5  small
5    NaN      ,  large
6  10033  9,-10  small
7  54321  19,-2  small
8    NaN   2,-5  large

Upvotes: 1

Related Questions