udothemath
udothemath

Reputation: 137

pandas group by and then select certain columns

I have an input dataframe

df_orders = pd.DataFrame({'item_id': [1, 1, 2, 2, 3, 4, 4, 5, 7, 8],
                        're_order':[0, 1, 0, 1, 1, 0, 1, 1, 1, 0],
                        'count':[27, 49, 3, 1, 6, 8, 14, 1, 1, 6] }, 
                         columns=['item_id', 're_order', 'count'])
or

   item_id  re_order  count
0        1         0     27
1        1         1     49
2        2         0      3
3        2         1      1
4        3         1      6
5        4         0      8
6        4         1     14
7        5         1      1
8        7         1      1
9        8         0      6
10       8         1      7

How should I group it using item_id and then only select the data where re_order = 1 as one column, and then have another column with total count?

My desired output

   item_id  reor_count  tot_count
0        1          49         76
1        2           1          4
2        3           6          6
3        4          14         22
4        5           1          1
5        7           1          1
6        8           7         13

I am able to find the total count (tot_count) using

df_orders_tot_count = df_orders.groupby('item_id')['count'].agg(['sum']).rename(columns={'sum': 'tot_count'}).reset_index()

      item_id  tot_count
0           1         76
1           2          4
2           3          6
3           4         22
4           5          1
5           7          1
6           8         13

but have difficulty to generate the reor_count column using Pythonic way. Any help will be appreciated!

Upvotes: 2

Views: 10015

Answers (2)

BENY
BENY

Reputation: 323306

You can try this

df.sort_values(['item_id','re_order']).groupby(['item_id'])['count'].agg({'count':'sum','re_order':'last'})
Out[244]: 
         re_order  count
item_id                 
1              49     76
2               1      4
3               6      6
4              14     22
5               1      1
7               1      1
8               6      6

Upvotes: 0

akuiper
akuiper

Reputation: 214987

You can precalculate the re_order column by multiplying re_order with count and then do groupby.sum:

(df_orders.assign(re_order = df_orders['re_order'] * df_orders['count'])
 .groupby('item_id', as_index=False).sum())

# item_id   re_order    count
#0      1       49       76
#1      2       1         4
#2      3       6         6
#3      4       14       22
#4      5       1         1
#5      7       1         1
#6      8       0         6

If you prefer the renamed columns names:

(df_orders.assign(re_order = df_orders['re_order'] * df_orders['count'])
 .groupby('item_id', as_index=False).sum()
 .rename(columns={'re_order': 'reor_count', 'count': 'tot_count'}))

Upvotes: 2

Related Questions