kasje
kasje

Reputation: 25

Selecting a specific number of results from columns in Pandas

I have a larger data source where I'm looking to gather the User IDs (Column 'A') for a specific group of people based on the value in column 'B' and so I have created a new dataframe with the info that I need using:

df2 = df1[df1['B'].isin([8,9,9.5,10,11])] 

Now I need to get the the first 40 values from col 'A' for value 8 in col 'B' and then the first 32 values from col 'A' for value 9 etc. etc. which i can do because my data is already sorted by the most relevant users - I just need to pick out X amount of them per the value in col 'B'

I want the output of that to be in this format ideally:

 A   B 
ID1  8
ID2  8
. . 
ID41 9 
ID42 9

I thought of using this for example

df2[(df2['B']== 8)][0:40]

but then i have to slice the dataframe X times to get all the User IDs for the values I need and there must be a quick way to specify the number of values from each column without slicing for each value in col 'B'

Thanks in advance!

Upvotes: 0

Views: 340

Answers (1)

BENY
BENY

Reputation: 323306

First we need build the condition map dict, then just do groupby with head

d = {8:40,9:32}

out = df.groupby('B').apply(lambda x : x.head(d[x['B'].iloc[0]])).reset_index(drop=True)

Or try with cumcount

out = df[df.groupby('B').cumcount() < df.B.map(d)]

Upvotes: 2

Related Questions