AVal
AVal

Reputation: 617

Add dictionary key as label if Pandas column contains any value from dictionary values

I have a dataFrame like this:

df = pd.DataFrame({'products' : ['a,b,c', 'a,c', 'b,d','a,b,c']})

    products
0   a,b,c
1   a,c
2   b,d
3   a,b,c

And I have created a dictionary that maps specific products to a certain category:

mydict = {'good':['a'],'bad':['d'],'neutral':['b','c','a']}

I'm trying to create a new column, let's say df['quality'] that adds the dictionary key (product category), if any of the products in df['products'] are included in the values for that particular key. Namely, the final output should look like so:

    products quality
0   a,b,c     good, neutral   
1   a,c       good, neutral
2   b,d       neutral, bad
3   a,b,c     good, neutral    

Any idea? Am I overcomplicating the problem?

Upvotes: 2

Views: 341

Answers (4)

rhug123
rhug123

Reputation: 8768

Here is another way:

d = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

df['quality'] = df['products'].str.split(',').explode().map(d).explode().groupby(level=0).unique().str.join(',')

or (first part converts the current dictionary to the new format)

s = df['products'].str.split(',').explode()

d = {i:[] for i in set(s)}

for k,v in mydict.items():
    for i in v:
        d.get(i).append(k)

s.map(d).map(set).groupby(level=0).agg(lambda x: set.union(*x)).str.join(',')

Upvotes: 2

Amin Ba
Amin Ba

Reputation: 2436

you should define mydict like this:

mydict = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

then:


def func(row):
    categories = []
    for item in row['products'].split(','):
        categories = categories + mydict[item]
    return ','.join(sorted(list(set(categories))))

    
df['quality'] = df.apply(lambda row: func(row), axis=1)

returns:

    products    quality
0   a,b,c       good,neutral
1   a,c         good,neutral
2   b,d         bad,neutral
3   a,b,c       good,neutral

Upvotes: 2

BENY
BENY

Reputation: 323306

Let us try

help = pd.Series(mydict).explode().reset_index().groupby(0)['index'].agg(','.join)

df['quality'] = df.products.replace(help,regex=True).str.split(',').map(set).str.join(',')
Out[150]: 
0    good,neutral
1    good,neutral
2     bad,neutral
3    good,neutral
Name: products, dtype: object

Upvotes: 2

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

You can first generate a reversed dictionary that maps categories to products, e.g. a -> [good, neutral]. Then split the values in df over ,, explode them and map them with this reversed dict. Then gather them back with groupby and set over the flattened list products where lastly they are joined with , :

from collections import defaultdict
from itertools import chain

# form the dictionary
reversed_dict = defaultdict(list)
[reversed_dict[cat].append(prod) for prod, categs in mydict.items()
                                 for cat in categs]

# apply over the df
df["quality"] = (df.products
                   .str.split(",")
                   .explode()
                   .map(reversed_dict)
                   .groupby(level=0)
                   .agg(lambda s: ", ".join(set(chain.from_iterable(s)))))

to get

>>> df

  products        quality
0    a,b,c  good, neutral
1      a,c  good, neutral
2      b,d   bad, neutral
3    a,b,c  good, neutral

Upvotes: 2

Related Questions