Reputation: 7245
I have a pandas dataframe that looks like the following
df
time case1 case2 case3
0 5 house bank atm
1 3 bank house pharmacy
2 10 bank bank atm
3 20 house pharmacy house
I want to add a column for each case that corresponds to average and standard deviation for the given category.
I know that
p_house = [20, 10]
p_bank = [5, 1]
p_atm = [3, 1]
p_pharmacy = [10, 5]
I would like to have a pandas dataframe like the following
df
time case1 case2 case3 p1 p2 p3
0 5 house bank atm [20, 10] [5, 1] [3, 1]
1 3 bank house pharmacy [5, 1] [20, 10] [10, 5]
2 10 bank bank atm [5, 1] [5, 1] [3, 1]
3 15 house pharmacy house [20, 10][10, 5] [20, 10]
Upvotes: 2
Views: 247
Reputation: 2757
Use applymap
and dict.get
to achieve dataframe level map,
dict_map = {'house' :[20, 10],
'bank':[5, 1],
'atm':[3, 1],
'pharmacy':[10, 5]}
df.join(df.filter(like='case')
.applymap(dict_map.get)
.rename(columns=lambda x:x.replace('case', 'p')))
Upvotes: 1
Reputation: 148880
You could stack, merge with an auxilliary dataframe and unstack:
prob = pd.DataFrame({'case': ['house', 'bank', 'atm', 'pharmacy'],
'prob':[[20,10], [5,1], [3,1], [10,5]]})
df.set_index('time', append=True).stack().reset_index().merge(
prob, left_on=0, right_on='case').drop(columns=0).set_index(
['level_0', 'time', 'level_2']).unstack()
gives:
case prob
level_2 case1 case2 case3 case1 case2 case3
level_0 time
0 5 house bank atm [20, 10] [5, 1] [3, 1]
1 3 bank house pharmacy [5, 1] [20, 10] [10, 5]
2 10 bank bank atm [5, 1] [5, 1] [3, 1]
3 20 house pharmacy house [20, 10] [10, 5] [20, 10]
Upvotes: 1
Reputation: 75080
You can create a dictionary and map the values:
p_house = [20, 10]
p_bank = [5, 1]
p_atm = [3, 1]
p_pharmacy = [10, 5]
d = {'house':p_house,'bank':p_bank,'atm':p_atm,'pharmacy':p_pharmacy}
df.assign(**df.stack().map(d).dropna().unstack().add_prefix('p_'))
time case1 case2 case3 p_case1 p_case2 p_case3
0 5 house bank atm [20, 10] [5, 1] [3, 1]
1 3 bank house pharmacy [5, 1] [20, 10] [10, 5]
2 10 bank bank atm [5, 1] [5, 1] [3, 1]
3 20 house pharmacy house [20, 10] [10, 5] [20, 10]
Upvotes: 2
Reputation: 3010
You can use map
on a series to add the values. I think this is simpler than the example above.
vals = {'house': [20, 10], 'bank': [5, 1], 'atm': [3, 1], 'pharmacy': [10, 5]}
for idx, col in enumerate(df.columns[1:]):
df[f'p{idx+1}'] = df[col].map(vals)
Upvotes: 2
Reputation: 23099
IIUC, you can use a dictionary then use stack
and map
then rejoin along the time index.
map_dict = { 'house' : [20, 10]
, 'bank' : [5, 1]
, 'atm' : [3, 1]
, 'pharmacy' : [10, 5] }
df1 = df.set_index('time').join(df.set_index("time").stack().map(map_dict).unstack()\
.rename(
columns=lambda x: x.replace("case", "p")
),how='outer')
case1 case2 case3 p1 p2 p3
time
5 house bank atm [20, 10] [5, 1] [3, 1]
3 bank house pharmacy [5, 1] [20, 10] [10, 5]
10 bank bank atm [5, 1] [5, 1] [3, 1]
20 house pharmacy house [20, 10] [10, 5] [20, 10]
Upvotes: 2
Reputation: 88226
You're probably better off defining a dictionary, and use the actual values to lookup:
from operator import itemgetter
d = {'house':[20, 10], 'bank':[5, 1], 'atm':[3,1], 'pharmacy':[10,5]}
l = list(zip(*(itemgetter(*l)(d) for l in df.loc[:,'case1':].values)))
for ix,col in enumerate(['p1','p2','p3']):
df[col] = list(l[ix])
print(df)
time case1 case2 case3 p1 p2 p3
0 5 house bank atm [20, 10] [5, 1] [3, 1]
1 3 bank house pharmacy [5, 1] [20, 10] [10, 5]
2 10 bank bank atm [5, 1] [5, 1] [3, 1]
3 20 house pharmacy house [20, 10] [10, 5] [20, 10]
Upvotes: 2