Reputation: 568
I seen various versions of this question but none of them seem to fit with what I am attempting to do: here's my data:
Here's the df with the NaN
s:
df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
"B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})
A B C
0 10023 , small
1 10040 17,-6 large
2 NaN 19,-2 large
3 12345 17,-5 small
4 NaN 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 NaN 19,-2 small
8 NaN 2,-5 large
Next I have the lookup df called df2
:
df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
"A": ["10040", "54321", "12345", "10033"]})
B A
0 17,-5 10040
1 19,-2 54321
2 37,-5 12345
3 9,-10 10033
I would like to fill in the NaN
s of column A
on df
by looking up column df2.B
and returning df2.A
such that the resulting dfr
looks like this:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 10040 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
Important caveats:
df
s do not have matching indexesdf.A
and df2.A
are non-unique()df2
do make up unique pairs.NaN
s.Using pandas, the rows of interest on df
would be found (I think) via: df.loc[df['A'].isnull(),]
. This answer seemed promising but I'm unclear where df1
in that example comes from. My actual data set is much, much larger than this and I'll have to be doing replacing several columns in this way.
Upvotes: 0
Views: 959
Reputation: 42916
The map
method from Wen-Ben will be faster in terms of speed, but here's another way you can solve this problem, just for your convenience and knowledge
You can use pd.merge
, because this is basically a join
problem.
After the merge, we fillna and drop the columns we dont need.
df_final = pd.merge(df, df2, on='B', how='left', suffixes=['_1','_2'])
df_final['A'] = df_final.A_1.fillna(df_final.A_2)
df_final.drop(['A_1', 'A_2'], axis=1, inplace=True)
print(df_final)
B C A
0 , small 10023
1 17,-6 large 10040
2 19,-2 large 54321
3 17,-5 small 12345
4 37,-5 small 12345
5 , large NaN
6 9,-10 small 10033
7 19,-2 small 54321
8 2,-5 large NaN
Upvotes: 1
Reputation: 323326
Just using np.where
df.A=np.where(df.A.isnull(),df.B.map(df2.set_index('B').A),df.A)
df
Out[149]:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 12345 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
Upvotes: 1