Parisa Zaeri
Parisa Zaeri

Reputation: 83

Can Pandas use a list for groupby?

import pandas as pd
import numpy as np

rng = np.random.RandomState(0)
df = pd.DataFrame({'key':['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': rng.randint(0, 10, 6)}, 
                  columns=['key', 'data1', 'data2'])
df

   key data1 data2
0   A   0   5
1   B   1   0
2   C   2   3
3   A   3   3
4   B   4   7
5   C   5   9


L = [0, 1, 0, 1, 2, 0]
print(df.groupby(L).sum())

The output is:

  data1 data2
0   7   17
1   4   3
2   4   7

I need a clear explanation, please?! What are 0s, 1s and 2 in the L? Are they key column of the df? or are they index label of df? And how groupby grouped based on L?

Upvotes: 3

Views: 99

Answers (3)

oppressionslayer
oppressionslayer

Reputation: 7224

You can see here how it's working:

In [6006]: df.groupby(L).agg(list)                                                                                                                                                             
Out[6006]: 
         key      data1      data2
0  [A, C, C]  [0, 2, 5]  [5, 3, 9]
1     [B, A]     [1, 3]     [0, 3]
2        [B]        [4]        [7]
In [6002]: list(df.groupby(L))                                                                                                                                                                 
Out[6002]: 
[(0,   key  data1  data2
  0   A      0      5
  2   C      2      3
  5   C      5      9), 
(1,   key  data1  data2
  1   B      1      0
  3   A      3      3), 
(2,   key  data1  data2
  4   B      4      7)]

In L, it groups the The 0, key, which is ACC, index 0,2m5 the 1 key is BA, index 1,3, and the two key is B, index 4

This is due to the alignment of the L key:

df['L'] = L

  key  data1  data2  L
0   A      0      5  0
1   B      1      0  1
2   C      2      3  0
3   A      3      3  1
4   B      4      7  2
5   C      5      9  0

I hope this makes sense

Upvotes: 0

Nicolas Gervais
Nicolas Gervais

Reputation: 36664

You can use a list to group observations in your dataframe. For instance, say you have the heights of a few people:

import pandas as pd

df = pd.DataFrame({'names':['John', 'Mark', 'Fred', 'Julia', 'Mary'],
                   'height':[180, 180, 180, 160, 160]})

print(df)
   names  height
0   John     180
1   Mark     180
2   Fred     180
3  Julia     160
4   Mary     160

And elsewhere, you received their assigned groups:

sex = ['man', 'man', 'man', 'woman', 'woman']

You won't need to concatenate a new column to your dataframe just to group them. You can use the list to do the work:

df.groupby(sex).mean()
       height
man       180
woman     160

Upvotes: 2

adhg
adhg

Reputation: 10883

the L is a list of integers in your example. As you groupby L you simply saying: Look at this list of integers and group my df based on those unique integers.

I think visualizing it will make sense (note that the df doesn't have column L - I just added it for visualization) :

enter image description here

groupby L means - take the unique values (in this case 0,1 and 2) and do sum for data1 and data2. So the result for data1 when L=0 would be for data1: 0+2+5=7 (etc)

enter image description here

and the end result would be:

df.groupby(L).sum()

    data1   data2
0   7       17
1   4       3
2   4       7

Upvotes: 2

Related Questions