Reputation: 83
import pandas as pd
import numpy as np
rng = np.random.RandomState(0)
df = pd.DataFrame({'key':['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': rng.randint(0, 10, 6)},
columns=['key', 'data1', 'data2'])
df
key data1 data2
0 A 0 5
1 B 1 0
2 C 2 3
3 A 3 3
4 B 4 7
5 C 5 9
L = [0, 1, 0, 1, 2, 0]
print(df.groupby(L).sum())
The output is:
data1 data2
0 7 17
1 4 3
2 4 7
I need a clear explanation, please?! What are 0s, 1s and 2 in the L
? Are they key
column of the df
? or are they index label of df
? And how groupby grouped based on L
?
Upvotes: 3
Views: 99
Reputation: 7224
You can see here how it's working:
In [6006]: df.groupby(L).agg(list)
Out[6006]:
key data1 data2
0 [A, C, C] [0, 2, 5] [5, 3, 9]
1 [B, A] [1, 3] [0, 3]
2 [B] [4] [7]
In [6002]: list(df.groupby(L))
Out[6002]:
[(0, key data1 data2
0 A 0 5
2 C 2 3
5 C 5 9),
(1, key data1 data2
1 B 1 0
3 A 3 3),
(2, key data1 data2
4 B 4 7)]
In L, it groups the The 0, key, which is ACC, index 0,2m5 the 1 key is BA, index 1,3, and the two key is B, index 4
This is due to the alignment of the L key:
df['L'] = L
key data1 data2 L
0 A 0 5 0
1 B 1 0 1
2 C 2 3 0
3 A 3 3 1
4 B 4 7 2
5 C 5 9 0
I hope this makes sense
Upvotes: 0
Reputation: 36664
You can use a list to group observations in your dataframe
. For instance, say you have the heights of a few people:
import pandas as pd
df = pd.DataFrame({'names':['John', 'Mark', 'Fred', 'Julia', 'Mary'],
'height':[180, 180, 180, 160, 160]})
print(df)
names height
0 John 180
1 Mark 180
2 Fred 180
3 Julia 160
4 Mary 160
And elsewhere, you received their assigned groups:
sex = ['man', 'man', 'man', 'woman', 'woman']
You won't need to concatenate a new column to your dataframe
just to group them. You can use the list to do the work:
df.groupby(sex).mean()
height
man 180
woman 160
Upvotes: 2
Reputation: 10883
the L is a list of integers in your example. As you groupby L you simply saying: Look at this list of integers and group my df based on those unique integers.
I think visualizing it will make sense (note that the df doesn't have column L - I just added it for visualization) :
groupby L means - take the unique values (in this case 0,1 and 2) and do sum for data1 and data2. So the result for data1 when L=0 would be for data1: 0+2+5=7 (etc)
and the end result would be:
df.groupby(L).sum()
data1 data2
0 7 17
1 4 3
2 4 7
Upvotes: 2