punit kumar Sharma
punit kumar Sharma

Reputation: 113

Pandas: Groupby two columns and count the occurence of all values for 2nd column

I want to groupby my dataframe using two columns, one is yearmonth(format : 16-10) and other is number of cust. Then if number of cumstomers are more the six, i want to create one one row which replaces all the rows with number of cust = 6+ and sum of total values for number of cust >6.

This is how data looks like

index     month      num ofcust    count

0            10          1.0         1
1            10          2.0         1
2            10          3.0         1
3            10          4.0         1
4            10          5.0         1
5            10          6.0         1
6            10          7.0         1
7            10          8.0         1
8            11          1.0         1
9            11          2.0         1
10           11          3.0         1
11           12          12.0        1

Output:

index   month   no of cust  count

0       16-10   1.0         3
1       16-10   2.0         6
2       16-10   3.0         2
3       16-10   4.0         3
4       16-10   5.0         4
5       16-10   6+          4
6       16-11   1.0         4
7       16-11   2.0         3
8       16-11   3.0         2
9       16-11   4.0         1
10      16-11   5.0         3
11      16-11   6+          5

Upvotes: 4

Views: 2159

Answers (1)

jezrael
jezrael

Reputation: 863481

I believe you need replace all values >=6 first and then groupby + aggregate sum:

s = df['num ofcust'].mask(df['num ofcust'] >=6, '6+')
#alternatively
#s = df['num ofcust'].where(df['num ofcust'] <6, '6+')
df = df.groupby(['month', s])['count'].sum().reset_index()
print (df)
   month num ofcust  count
0     10          1      1
1     10          2      1
2     10          3      1
3     10          4      1
4     10          5      1
5     10         6+      3
6     11          1      1
7     11          2      1
8     11          3      1
9     12         6+      1

Detail:

print (s)
0      1
1      2
2      3
3      4
4      5
5     6+
6     6+
7     6+
8      1
9      2
10     3
11    6+
Name: num ofcust, dtype: object

Another very similar solution is append data to column first:

df.loc[df['num ofcust'] >= 6, 'num ofcust'] = '6+'
df = df.groupby(['month', 'num ofcust'], as_index=False)['count'].sum()
print (df)
   month num ofcust  count
0     10          1      1
1     10          2      1
2     10          3      1
3     10          4      1
4     10          5      1
5     10         6+      3
6     11          1      1
7     11          2      1
8     11          3      1
9     12         6+      1

Upvotes: 2

Related Questions