Reputation: 71
I have a dataset that contains values that fall into the same categories but have different names. I was thinking of creating a dictionary with a key assigned to multiple values and then replace the values in the column with the key. Here is what I have and what I want to achieve.
Define dictionary
sspdict={'Eva':["M-EV", "G-EV"],'Re Sci': ['G-RESC', 'M-RESC', 'S-RESC', 'D-RESC'], 'Ed':['G-PO' , 'M-PO'], 'Global':['C-GCC', 'D-GCLA', 'C-LACL']}
Dataset:
Col1 Col2 Col3
12 No M-EV
22 Yes G-EV
23 Yes G-RESC
35 No M-PO
34 Yes D-GCLA
46 No S-RESC
89 No G-PO
90 Yes C-GCC
Desired outcome
Col1 Col2 Col3
12 No Eva
22 Yes Eva
23 Yes Re Sci
35 No Ed
34 Yes Global
46 No Re Sci
89 No Ed
90 Yes Global
Can you please kindly help?
Upvotes: 1
Views: 715
Reputation: 153460
Let's try this one-liner:
df.assign(Col3 = df['Col3'].apply(lambda x: [key for key, value in sspdict.items() if x in value][0]))
Or let's use a generator:
df.assign(Col3 = df['Col3'].apply(lambda x: next(key for key, value in sspdict.items() if x in value)))
Output:
Col1 Col2 Col3
0 12 No Eva
1 22 Yes Eva
2 23 Yes Re Sci
3 35 No Ed
4 34 Yes Global
5 46 No Re Sci
6 89 No Ed
7 90 Yes Global
Upvotes: 3
Reputation:
I would recommend you change the sspdict
data structure to simplify things and probably increase performance if sspdict
is not bigger than the input dataset. It could be something like this:
dataset = '''Col1 Col2 Col3
12 No M-EV
22 Yes G-EV
23 Yes G-RESC
35 No M-PO
34 Yes D-GCLA
46 No S-RESC
89 No G-PO
90 Yes C-GCC
'''
sspdict = {'Eva':["M-EV", "G-EV"],'Re Sci': ['G-RESC', 'M-RESC', 'S-RESC', 'D-RESC'], 'Ed':['G-PO' , 'M-PO'], 'Global':['C-GCC', 'D-GCLA', 'C-LACL']}
lookup_dict = {value: key for key, values in sspdict.items() for value in values}
result = ''
for line in dataset.splitlines()[1:]:
key = line.split()[2]
result += line.rstrip(key) + lookup_dict[key] + '\n'
print(result)
Upvotes: 0