Reputation: 25
I have a larger data source where I'm looking to gather the User IDs (Column 'A') for a specific group of people based on the value in column 'B' and so I have created a new dataframe with the info that I need using:
df2 = df1[df1['B'].isin([8,9,9.5,10,11])]
Now I need to get the the first 40 values from col 'A' for value 8 in col 'B' and then the first 32 values from col 'A' for value 9 etc. etc. which i can do because my data is already sorted by the most relevant users - I just need to pick out X amount of them per the value in col 'B'
I want the output of that to be in this format ideally:
A B
ID1 8
ID2 8
. .
ID41 9
ID42 9
I thought of using this for example
df2[(df2['B']== 8)][0:40]
but then i have to slice the dataframe X times to get all the User IDs for the values I need and there must be a quick way to specify the number of values from each column without slicing for each value in col 'B'
Thanks in advance!
Upvotes: 0
Views: 340
Reputation: 323306
First we need build the condition map dict
, then just do groupby
with head
d = {8:40,9:32}
out = df.groupby('B').apply(lambda x : x.head(d[x['B'].iloc[0]])).reset_index(drop=True)
Or try with cumcount
out = df[df.groupby('B').cumcount() < df.B.map(d)]
Upvotes: 2