Reputation: 617
I have a dataFrame like this:
df = pd.DataFrame({'products' : ['a,b,c', 'a,c', 'b,d','a,b,c']})
products
0 a,b,c
1 a,c
2 b,d
3 a,b,c
And I have created a dictionary that maps specific products to a certain category:
mydict = {'good':['a'],'bad':['d'],'neutral':['b','c','a']}
I'm trying to create a new column, let's say df['quality']
that adds the dictionary key (product category), if any of the products in df['products']
are included in the values for that particular key.
Namely, the final output should look like so:
products quality
0 a,b,c good, neutral
1 a,c good, neutral
2 b,d neutral, bad
3 a,b,c good, neutral
Any idea? Am I overcomplicating the problem?
Upvotes: 2
Views: 341
Reputation: 8768
Here is another way:
d = {'a': ['good', 'neutral'],
'b': ['neutral'],
'c': ['neutral'],
'd': ['bad']}
df['quality'] = df['products'].str.split(',').explode().map(d).explode().groupby(level=0).unique().str.join(',')
or (first part converts the current dictionary to the new format)
s = df['products'].str.split(',').explode()
d = {i:[] for i in set(s)}
for k,v in mydict.items():
for i in v:
d.get(i).append(k)
s.map(d).map(set).groupby(level=0).agg(lambda x: set.union(*x)).str.join(',')
Upvotes: 2
Reputation: 2436
you should define mydict like this:
mydict = {'a': ['good', 'neutral'],
'b': ['neutral'],
'c': ['neutral'],
'd': ['bad']}
then:
def func(row):
categories = []
for item in row['products'].split(','):
categories = categories + mydict[item]
return ','.join(sorted(list(set(categories))))
df['quality'] = df.apply(lambda row: func(row), axis=1)
returns:
products quality
0 a,b,c good,neutral
1 a,c good,neutral
2 b,d bad,neutral
3 a,b,c good,neutral
Upvotes: 2
Reputation: 323306
Let us try
help = pd.Series(mydict).explode().reset_index().groupby(0)['index'].agg(','.join)
df['quality'] = df.products.replace(help,regex=True).str.split(',').map(set).str.join(',')
Out[150]:
0 good,neutral
1 good,neutral
2 bad,neutral
3 good,neutral
Name: products, dtype: object
Upvotes: 2
Reputation: 18306
You can first generate a reversed dictionary that maps categories to products, e.g. a -> [good, neutral]
. Then split the values in df
over ,
, explode
them and map
them with this reversed dict. Then gather them back with groupby
and set
over the flattened list products where lastly they are joined with ,
:
from collections import defaultdict
from itertools import chain
# form the dictionary
reversed_dict = defaultdict(list)
[reversed_dict[cat].append(prod) for prod, categs in mydict.items()
for cat in categs]
# apply over the df
df["quality"] = (df.products
.str.split(",")
.explode()
.map(reversed_dict)
.groupby(level=0)
.agg(lambda s: ", ".join(set(chain.from_iterable(s)))))
to get
>>> df
products quality
0 a,b,c good, neutral
1 a,c good, neutral
2 b,d bad, neutral
3 a,b,c good, neutral
Upvotes: 2