Reputation: 81
I have a pandas dataframe where I need to replace all the "yes" values with the corresponding number from a nested dictionary. The nested dictionary is composed of the row values of column 'Store' as the outer keys. The inner keys are the columns named 'A' and 'B'.
This is the dataframe:
import pandas as pd
data = [['abc', 'jan','yes','no'], ['abc', 'feb','no','yes'], ['def', 'jan', 'yes','yes'], ['def', 'feb', 'no','yes']]
df = pd.DataFrame(data, columns = ['Store', 'Month', 'A','B' ])
df
Store Month A B
0 abc jan yes no
1 abc feb no yes
2 def jan yes yes
3 def feb no yes
This is the nested dictionary:
# dict = {row value in 'Store' column:{column:point value}}
dict = {'abc':{'A':5,'B':4},'def':{'A':3,'B':2}}
This is the desired output:
Store Month A B
0 abc jan 5 no
1 abc feb no 4
2 def jan 3 2
3 def feb no 2
Upvotes: 1
Views: 262
Reputation:
Here's another option using np.where
.
(i) map
dict
(FYI, dict
is the name of a dictionary constructor, I replace it with dct
here) to "Store" to match the relevant dictionaries for each row.
(ii) flatten df[['A','B']]
to a numpy array
(iii) iterate over result from (i) and get the values of each dictionary, and flatten the resulting list using itertools.chain
(iv) use np.where
to select values based on whether the result from (ii) is "yes" or not
(v) reshape the result from (iv) into a 2D array and assign back to df[['A','B']]
from itertools import chain
mapper = df['Store'].map(dct)
flat_AB = df[['A','B']].to_numpy().flatten()
values_from_dict = list(chain.from_iterable([d.values() for d in mapper]))
df[['A','B']] = np.where(flat_AB == 'yes', values_from_dict, flat_AB).reshape(-1,2)
or you can directly use the DataFrame itself (but I think this is slower than the above method)
df[['A','B']] = np.where(df[['A','B']] == 'yes', pd.DataFrame(df['Store'].map(dct).tolist()), df[['A','B']])
Output:
Store Month A B
0 abc jan 5 no
1 abc feb no 4
2 def jan 3 2
3 def feb no 2
Upvotes: 1
Reputation: 120419
Replace 'yes'
by np.nan
and fillna
by the values of your nested dict:
d = {'abc':{'A':5,'B':4},'def':{'A':3,'B':2}}
out = df.replace({'yes': np.nan}).groupby('Store') \
.apply(lambda x: x.fillna(d[x.name])).droplevel(0)
print(out)
# Output
Store Month A B
0 abc jan 5 no
1 abc feb no 4
2 def jan 3 2
3 def feb no 2
Upvotes: 2
Reputation: 1030
Try this:
def find_num(store, col_value, col_name):
if col_value == "yes":
sub_dict = dict[store]
return sub_dict[col_name]
else:
return "no"
for col in list(df.columns):
if col == "Store" or col == "Month":
continue
df[col] = df.apply(lambda x: find_num(x['Store'], x[col], col), axis=1)
print(df)
Upvotes: 1