Omido
Omido

Reputation: 71

python: Assign multiple values to a dictionary key and replace a column values with the keys

I have a dataset that contains values that fall into the same categories but have different names. I was thinking of creating a dictionary with a key assigned to multiple values and then replace the values in the column with the key. Here is what I have and what I want to achieve.

Define dictionary

sspdict={'Eva':["M-EV", "G-EV"],'Re Sci': ['G-RESC', 'M-RESC', 'S-RESC', 'D-RESC'], 'Ed':['G-PO' , 'M-PO'], 'Global':['C-GCC', 'D-GCLA', 'C-LACL']}

Dataset:

Col1  Col2  Col3
12    No     M-EV
22    Yes    G-EV
23    Yes    G-RESC
35    No     M-PO
34    Yes    D-GCLA
46    No     S-RESC
89    No     G-PO
90    Yes    C-GCC

Desired outcome

Col1  Col2  Col3
12    No     Eva
22    Yes    Eva
23    Yes    Re Sci
35    No     Ed
34    Yes    Global
46    No     Re Sci
89    No     Ed
90    Yes    Global

Can you please kindly help?

Upvotes: 1

Views: 715

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Let's try this one-liner:

df.assign(Col3 = df['Col3'].apply(lambda x: [key for key, value in sspdict.items() if x in value][0]))

Or let's use a generator:

df.assign(Col3 = df['Col3'].apply(lambda x: next(key for key, value in sspdict.items() if x in value)))

Output:

   Col1 Col2    Col3
0    12   No     Eva
1    22  Yes     Eva
2    23  Yes  Re Sci
3    35   No      Ed
4    34  Yes  Global
5    46   No  Re Sci
6    89   No      Ed
7    90  Yes  Global

Upvotes: 3

user1785721
user1785721

Reputation:

I would recommend you change the sspdict data structure to simplify things and probably increase performance if sspdict is not bigger than the input dataset. It could be something like this:

dataset = '''Col1  Col2  Col3
12    No     M-EV
22    Yes    G-EV
23    Yes    G-RESC
35    No     M-PO
34    Yes    D-GCLA
46    No     S-RESC
89    No     G-PO
90    Yes    C-GCC
'''

sspdict = {'Eva':["M-EV", "G-EV"],'Re Sci': ['G-RESC', 'M-RESC', 'S-RESC', 'D-RESC'], 'Ed':['G-PO' , 'M-PO'], 'Global':['C-GCC', 'D-GCLA', 'C-LACL']}

lookup_dict = {value: key for key, values in sspdict.items() for value in values}
result = ''
for line in dataset.splitlines()[1:]:
    key = line.split()[2]
    result += line.rstrip(key) + lookup_dict[key] + '\n'

print(result)

Upvotes: 0

Related Questions