bkt619
bkt619

Reputation: 81

Utilizing nested dictionary to replace specific values in a Pandas Dataframe

I have a pandas dataframe where I need to replace all the "yes" values with the corresponding number from a nested dictionary. The nested dictionary is composed of the row values of column 'Store' as the outer keys. The inner keys are the columns named 'A' and 'B'.

This is the dataframe:

import pandas as pd

data = [['abc', 'jan','yes','no'], ['abc', 'feb','no','yes'], ['def', 'jan', 'yes','yes'],  ['def', 'feb', 'no','yes']]

df = pd.DataFrame(data, columns = ['Store', 'Month', 'A','B' ])

df
 
  Store Month    A    B
0   abc   jan  yes   no
1   abc   feb   no  yes
2   def   jan  yes  yes
3   def   feb   no  yes

This is the nested dictionary:

# dict = {row value in 'Store' column:{column:point value}}
dict = {'abc':{'A':5,'B':4},'def':{'A':3,'B':2}}

This is the desired output:

  Store Month    A    B
0   abc   jan    5   no
1   abc   feb   no    4
2   def   jan    3    2
3   def   feb   no    2

Upvotes: 1

Views: 262

Answers (3)

user7864386
user7864386

Reputation:

Here's another option using np.where.

(i) map dict (FYI, dict is the name of a dictionary constructor, I replace it with dct here) to "Store" to match the relevant dictionaries for each row.

(ii) flatten df[['A','B']] to a numpy array

(iii) iterate over result from (i) and get the values of each dictionary, and flatten the resulting list using itertools.chain

(iv) use np.where to select values based on whether the result from (ii) is "yes" or not

(v) reshape the result from (iv) into a 2D array and assign back to df[['A','B']]

from itertools import chain
mapper = df['Store'].map(dct)
flat_AB = df[['A','B']].to_numpy().flatten()
values_from_dict = list(chain.from_iterable([d.values() for d in mapper]))
df[['A','B']] = np.where(flat_AB == 'yes', values_from_dict, flat_AB).reshape(-1,2)

or you can directly use the DataFrame itself (but I think this is slower than the above method)

df[['A','B']] = np.where(df[['A','B']] == 'yes', pd.DataFrame(df['Store'].map(dct).tolist()), df[['A','B']])

Output:

  Store Month   A   B
0   abc   jan   5  no
1   abc   feb  no   4
2   def   jan   3   2
3   def   feb  no   2

Upvotes: 1

Corralien
Corralien

Reputation: 120419

Replace 'yes' by np.nan and fillna by the values of your nested dict:

d = {'abc':{'A':5,'B':4},'def':{'A':3,'B':2}}
out = df.replace({'yes': np.nan}).groupby('Store') \
        .apply(lambda x: x.fillna(d[x.name])).droplevel(0)
print(out)

# Output
  Store Month   A   B
0   abc   jan   5  no
1   abc   feb  no   4
2   def   jan   3   2
3   def   feb  no   2

Upvotes: 2

Tomer S
Tomer S

Reputation: 1030

Try this:

def find_num(store, col_value, col_name):
    if col_value == "yes":
        sub_dict = dict[store]
        return sub_dict[col_name]
    else:
        return "no"


for col in list(df.columns):
    if col == "Store" or col == "Month":
        continue
    df[col] = df.apply(lambda x: find_num(x['Store'], x[col], col), axis=1)

print(df)

Upvotes: 1

Related Questions