Reputation:
I would need to create a new column as follows:
Suppose that my dataset looks like
Items Date
calzini 2020/02/23
cintura 2020/02/21
maglietta 2020/02/23
maglietta 2020/02/22
cappello 2020/02/23
jeans 2020/02/23
cappello 2020/02/22
maglietta 2020/02/22
maglietta 2020/02/22
jeans 2020/02/22
jeans 2020/02/23
maglietta 2020/02/23
jeans 2020/02/22
jeans 2020/02/23
I would like to have
Items Category
calzini bad
cintura bad
maglietta best seller
maglietta best seller
jeans best seller
cappello ok
jeans best seller
cappello ok
maglietta best seller
maglietta best seller
jeans best seller
maglietta best seller
jeans best seller
jeans best seller
I already determined the frequency of the items as follows:
sold_items=df.groupby(['Items'])['Date'].count().sort_values(ascending=False) # the items should be counted overall, not using a specific Date! It is about how many items were sold
I would like to ask you how to create a new column with those values.
Upvotes: 0
Views: 552
Reputation: 46898
You can use cut on value_counts:
pd.cut(df['Items'].value_counts(),bins=[0,1,4,10])
maglietta (4, 10]
jeans (4, 10]
cappello (1, 4]
calzini (0, 1]
cintura (0, 1]
Name: Items, dtype: category
Categories (3, interval[int64]): [(0, 1] < (1, 4] < (4, 10]]
So this cut does not include the lowest, hence round brackets on left, and includes highlist, square brackets on right. Now we convert these labels to what you need:
cats = pd.cut(df['Items'].value_counts(),bins=[0,1,4,10],labels=['bad','ok','best seller'])
And just map the values according to the category and assign it to a new column using .tonumpy() (thanks to @Ch3steR for pointing it out, see comments):
df['Category'] = cats[df['Items']].to_numpy()
df
Items Date Category
0 calzini 2020/02/23 bad
1 cintura 2020/02/21 bad
2 maglietta 2020/02/23 best seller
3 maglietta 2020/02/22 best seller
4 cappello 2020/02/23 ok
5 jeans 2020/02/23 best seller
6 cappello 2020/02/22 ok
7 maglietta 2020/02/22 best seller
8 maglietta 2020/02/22 best seller
9 jeans 2020/02/22 best seller
10 jeans 2020/02/23 best seller
11 maglietta 2020/02/23 best seller
12 jeans 2020/02/22 best seller
13 jeans 2020/02/23 best seller
You can also use df['Category'] = df['Items'].map(cats)
Upvotes: 2
Reputation: 8790
You can also replace the conditional values in value_counts
and then map
:
counts = df['Items'].value_counts()
counts = counts.replace(counts.values, ['best seller' if i >= 5 else ('ok' if i in [2,3,4] else 'bad') for i in counts])
df['category'] = df['Items'].map(counts)
Upvotes: 0
Reputation: 20445
According to what you have defined, you dont have dependency of category on date(as assumed by the outpu).
You can simply use apply function on axis 1
def testfun(e):
count = len(df[df["Items"] == e["Items"]])
if(count>=5):
return "best seller"
if(count>=2 and count<5 ):
return "ok"
else:
return "bad"
df["count"] = df.apply(testfun,axis=1)
1 cintura bad
2 maglietta best seller
3 maglietta best seller
4 cappello ok
5 jeans best seller
6 cappello ok
7 maglietta best seller
8 maglietta best seller
9 jeans best seller
10 jeans best seller
11 maglietta best seller
12 jeans best seller
13 jeans best seller
Upvotes: 0
Reputation: 8768
The code below should work.
df['category'] = pd.cut(df['sold_items'],bins = [0,1,4,df['sold_items'].max()],labels = ['bad','ok','best seller'])
Upvotes: 3
Reputation: 20669
You can use GroupBy.transform
and use np.select
vals = df['Items'].groupby(df['Items']).transform('count')
condlist = [vals.ge(5), (vals.ge(2) & vals.lt(5)) , vals.le(2)]
choicelist = ['best seller', 'ok', 'bad']
df.assign(category = np.select(condlist, choicelist))
Items Date category
0 calzini 2020/02/23 bad
1 cintura 2020/02/21 bad
2 maglietta 2020/02/23 best seller
3 maglietta 2020/02/22 best seller
4 cappello 2020/02/23 ok
5 jeans 2020/02/23 best seller
6 cappello 2020/02/22 ok
7 maglietta 2020/02/22 best seller
8 maglietta 2020/02/22 best seller
9 jeans 2020/02/22 best seller
10 jeans 2020/02/23 best seller
11 maglietta 2020/02/23 best seller
12 jeans 2020/02/22 best seller
13 jeans 2020/02/23 best seller
Upvotes: 2
Reputation: 10590
Use groupby
and transform
. You'll also need to create a function to categorize your items:
def categorize(x):
num = len(x)
if num >= 5:
return 'best seller'
elif num >= 3:
return 'ok'
else:
return 'bad'
df['category'] = df.groupby('Items').transform(categorize)
Upvotes: 0