daiyue
daiyue

Reputation: 7458

pandas how to count the number of rows whose column values add up to a threshold

I have the following df,

pct    id
0.3    631
0.2    115
0.1    312
0.2    581
0.01   574
0.09   586

I want to first sort the df by pct,

df.sort_values(by=['pct'], ascending=False, inplace=True)

then adding up pct to 0.8 and count how many rows does that, e.g. top 4 rows in this case; I am wondering whats the best way to it. using pd.eval or pd.query?

Upvotes: 1

Views: 63

Answers (1)

jezrael
jezrael

Reputation: 863701

Use cumulative sum by Series.cumsum like mentioned @Chris A and if need count number of rows use sum:

out = df['pct'].cumsum().lt(0.8).sum()
print (out)
4

Or filter by boolean indexing:

df1 = df[df['pct'].cumsum().lt(0.8)]
print (df1)
   pct   id
0  0.3  631
1  0.2  115
3  0.2  581
2  0.1  312

Upvotes: 3

Related Questions