Reputation: 37
I have two dataframes - one large dataframe with multiple categorical columns and one column with missing values, and another that's sort of a dictionary with the same categorical columns and one column with a key value.
Essentially, I want to fill the missing values in the large dataframe with the key value in the second if all the categorical columns match.
Missing value df:
Color Number Letter Value
0 Red 2 B NaN
1 Green 2 A NaN
2 Red 2 B NaN
3 Red 1 B NaN
4 Green 1 A NaN
5 Red 2 B NaN
6 Green 1 B NaN
7 Green 2 A NaN
Dictionary df:
Color Number Letter Value
0 Red 1 A 10
1 Red 1 B 4
2 Red 2 A 3
3 Red 2 B 15
4 Green 1 A 21
5 Green 1 B 9
6 Green 2 A 22
7 Green 2 B 1
Desired df:
0 Red 2 B 15
1 Green 2 A 22
2 Red 2 B 15
3 Red 1 B 4
4 Green 1 A 21
5 Red 2 B 15
6 Green 1 B 9
7 Green 2 A 22
I'm not sure if I should have the 'dictionary df' as an actual dictionary, or keep it as a dataframe (it's pulled from a csv).
Is this possible to do cleanly without a myriad of if else statements?
Thanks!
Upvotes: 0
Views: 1257
Reputation: 5385
Does this work?
>>> df_1[['Color', 'Number', 'Letter']].merge(df_2,
... on=('Color', 'Number', 'Letter'),
... how='left')
Color Number Letter Value
0 Red 2 B 15
1 Green 2 A 22
2 Red 2 B 15
3 Red 1 B 4
4 Green 1 A 21
5 Red 2 B 15
6 Green 1 B 9
7 Green 2 A 22
Thought it worth mentioning - a very simple way to convert examples from stackoverflow pandas questions into a dataframe, just cut and paste it into a string like this:
>>> df_1 = pd.read_csv(StringIO("""
... Color Number Letter Value
... 0 Red 2 B NaN
... 1 Green 2 A NaN
... 2 Red 2 B NaN
... 3 Red 1 B NaN
... 4 Green 1 A NaN
... 5 Red 2 B NaN
... 6 Green 1 B NaN
... 7 Green 2 A NaN
... """), sep=r'\s+')
Upvotes: 1
Reputation: 153460
Try:
missing_df.reset_index()[['index', 'Color', 'Number', 'Letter']]\
.merge(dict_df, on = ['Color', 'Number', 'Letter'])\
.set_index('index').reindex(missing_df.index)
Output:
Color Number Letter Value
0 Red 2 B 15
1 Green 2 A 22
2 Red 2 B 15
3 Red 1 B 4
4 Green 1 A 21
5 Red 2 B 15
6 Green 1 B 9
7 Green 2 A 22
Upvotes: 1
Reputation: 849
I will be calling Missing value df as: df and Dictionary df as: ddf, considering both as dataframes
First drop the null values column from Missing value df:
df.drop(['Value'], axis=1)
Secondly run the below command, which should do the task for you.
df.assign(Value=ddf['Value'])
Upvotes: 0