Reputation: 815
Given the following sample Dataframe:
df = pd.DataFrame( { 'A' : [ 1, 1, 1, 2, 2, 2, 3, 3, 3 ],
'B' : [ 'x', 'y', 'z', 'x', 'y', 'y', 'x', 'x', 'x' ] } )
I want to generate a scatterplot of the unique values of B (with the points sized by the number of B values within each group of unique values) against their corresponding values of A, so I want to get the following three lists:
A = [ 1, 1, 1, 2, 2, 3 ]
B = ['x', 'y', 'z', 'x', 'y', 'x']
Bsize = [ 1, 1, 1, 1, 2, 3]
I've tried doing this with groupby:
group = df.groupby(['A','B'])
The keys of the group contain the data I want, but they're not ordered:
group.group.keys()
[(1, 2), (1, 3), (3, 1), (2, 1), (2, 2), (1, 1)]
The 'first' method returns what looks like a Dataframe, but I can't access the 'A' and 'B' keys:
group.first()['A']
...
KeyError: 'A'
If I iterate through the names and groups, things seem to be ordered, so I can get what I want by doing:
A = []
B = []
for name, _ in group:
A.append(name[0])
B.append(name[1])
I can then get the Bsize list by doing:
group['B'].count().values
array([1, 1, 1, 1, 2, 3])
However, this seems clunky in the extreme and suggests to me I haven't understood how to properly use the group.
Upvotes: 1
Views: 276
Reputation: 21542
IIUC maybe you can import numpy as np
and:
In [52]: group = df.groupby(['A','B']).apply(np.unique).reset_index()
In [53]: group
Out[53]:
A B 0
0 1 x [1, x]
1 1 y [1, y]
2 1 z [1, z]
3 2 x [2, x]
4 2 y [2, y]
5 3 x [3, x]
then:
In [57]: A = group['A'].tolist()
In [58]: B = group['B'].tolist()
In [59]: A
Out[59]: [1, 1, 1, 2, 2, 3]
In [60]: B
Out[60]: ['x', 'y', 'z', 'x', 'y', 'x']
to get all the lists you need in one shot you can:
In [87]: group = df.groupby(['A','B']).size().reset_index(name='s')
In [88]: group
Out[88]:
A B s
0 1 x 1
1 1 y 1
2 1 z 1
3 2 x 1
4 2 y 2
5 3 x 3
Bsize:
In [91]: group['s'].tolist()
Out[91]: [1, 1, 1, 1, 2, 3]
A:
In [92]: group['A'].tolist()
Out[92]: [1, 1, 1, 2, 2, 3]
B:
In [93]: group['B'].tolist()
Out[93]: ['x', 'y', 'z', 'x', 'y', 'x']
EDIT: in the last dataframe you have all the information you need, so you can keep only the last one to get all of your lists.
Upvotes: 1