YummyBlueberries
YummyBlueberries

Reputation: 37

Fill dataframe column with a value if multiple columns match values in a dictionary

I have two dataframes - one large dataframe with multiple categorical columns and one column with missing values, and another that's sort of a dictionary with the same categorical columns and one column with a key value.

Essentially, I want to fill the missing values in the large dataframe with the key value in the second if all the categorical columns match.

Missing value df:

    Color      Number    Letter       Value
0   Red          2          B          NaN
1   Green        2          A          NaN 
2   Red          2          B          NaN
3   Red          1          B          NaN
4   Green        1          A          NaN
5   Red          2          B          NaN
6   Green        1          B          NaN
7   Green        2          A          NaN

Dictionary df:

    Color      Number    Letter       Value
0   Red          1          A          10
1   Red          1          B           4
2   Red          2          A           3
3   Red          2          B          15
4   Green        1          A          21
5   Green        1          B           9
6   Green        2          A          22
7   Green        2          B           1

Desired df:

0   Red          2          B          15
1   Green        2          A          22 
2   Red          2          B          15
3   Red          1          B          4
4   Green        1          A          21
5   Red          2          B          15
6   Green        1          B          9
7   Green        2          A          22

I'm not sure if I should have the 'dictionary df' as an actual dictionary, or keep it as a dataframe (it's pulled from a csv).

Is this possible to do cleanly without a myriad of if else statements?

Thanks!

Upvotes: 0

Views: 1257

Answers (3)

Todd
Todd

Reputation: 5385

Does this work?

>>> df_1[['Color', 'Number', 'Letter']].merge(df_2, 
...                                           on=('Color', 'Number', 'Letter'),   
...                                           how='left')
   Color  Number Letter  Value
0    Red       2      B     15
1  Green       2      A     22
2    Red       2      B     15
3    Red       1      B      4
4  Green       1      A     21
5    Red       2      B     15
6  Green       1      B      9
7  Green       2      A     22

Thought it worth mentioning - a very simple way to convert examples from stackoverflow pandas questions into a dataframe, just cut and paste it into a string like this:

>>> df_1 = pd.read_csv(StringIO("""
...     Color      Number    Letter       Value
... 0   Red          2          B          NaN
... 1   Green        2          A          NaN
... 2   Red          2          B          NaN
... 3   Red          1          B          NaN
... 4   Green        1          A          NaN
... 5   Red          2          B          NaN
... 6   Green        1          B          NaN
... 7   Green        2          A          NaN
... """), sep=r'\s+')

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

Try:

missing_df.reset_index()[['index', 'Color', 'Number', 'Letter']]\
          .merge(dict_df, on = ['Color', 'Number', 'Letter'])\
          .set_index('index').reindex(missing_df.index)

Output:

  Color  Number Letter  Value
0    Red       2      B     15
1  Green       2      A     22
2    Red       2      B     15
3    Red       1      B      4
4  Green       1      A     21
5    Red       2      B     15
6  Green       1      B      9
7  Green       2      A     22

Upvotes: 1

Jai
Jai

Reputation: 849

I will be calling Missing value df as: df and Dictionary df as: ddf, considering both as dataframes

First drop the null values column from Missing value df:

df.drop(['Value'], axis=1)

Secondly run the below command, which should do the task for you.

df.assign(Value=ddf['Value'])

Upvotes: 0

Related Questions