Reputation: 732
This question is similar to this other question.
I have a pandas dataframe. I want to split it into groups, and select an arbitrary member of each group, defined elsewhere.
Example: I have a dataframe that can be divided in 6 groups of 4 observations each. I want to extract the observations according to:
selected = [0,3,2,3,1,3]
This is very similar to
df.groupy('groupvar').nth(n)
But, crucially, n varies for each group according to the selected list.
Thanks!
Upvotes: 0
Views: 1242
Reputation: 16498
Typically everything that you do within groupby
should be group independent. So, within any groupby.apply()
, you will only get the group itself, not the context. An alternative is to compute the index
value for the whole sample (following, index
) out of the indices for the groups (here, selected
). Note that the dataset is sorted by groups, which you need to do if you want to apply the following.
I use test
, out of which I want to select selected
:
In[231]: test
Out[231]:
score
name
0 A -0.208392
1 A -0.103659
2 A 1.645287
0 B 0.119709
1 B -0.047639
2 B -0.479155
0 C -0.415372
1 C -1.390416
2 C -0.384158
3 C -1.328278
selected = [0, 2, 1]
c = test.groupby(level=1).count()
In[242]: index = c.shift(1).cumsum().add(array([selected]).T, fill_value=0)
In[243]: index
Out[243]:
score
name
A 0
B 5
C 4
In[255]: test.iloc[index.values[:,0]]
Out[255]:
score
name
0 A -0.208392
2 B -0.479155
1 C -1.390416
Upvotes: 1