CharlesD
CharlesD

Reputation: 111

While loop on a dataframe column?

I have a small dataframe comprised of two columns, an ORG column and a percentage column. The dataframe is sorted largest to smallest based on the percentage column.

I'd like to create a while loop that adds up the values in the percentage column up until it hits a value of .80 (80%).

So far I've tried:

retail_pareto = 0
counter = 0
while retail_pareto < .80:
    retail_pareto += retailerDF[counter]['RETAILER_PCT_OF_CHANGE']
    counter += 1

This does not work, both the counter and the counter and retail_pareto value remain at zero with no real error message to help me troubleshoot what I'm doing incorrectly. Ideally, I'd like to end up with a list of the orgs with the largest percentage that together add up to 80%.

I'm not exactly sure what to try next. I've searched these forums, but haven't found anything similar in the forums yet.

Any advice or help is much appreciated. Thank you.

Example Dataframe:
ORG    PCT
KST    0.582561
ISL    0.290904
BOV    0.254456
BRH    0.10824
GNT    0.0913631
DSH    0.023441
RDM    -0.0119665
JBL    -0.0348893
JBD    -0.071883
WEG    -0.232227



The output that I would expect would be something along the lines of:
ORG    PCT
KST    0.582561
ISL    0.290904

Upvotes: 0

Views: 1552

Answers (3)

Valdi_Bo
Valdi_Bo

Reputation: 31011

Instead of your loop, take a more pandasonic approach. Start with computing an additional column containing cumulative sum of RETAILER_PCT_OF_CHANGE:

df['pct_cum'] = df.RETAILER_PCT_OF_CHANGE.cumsum()

For your data, the result is:

   ORG  RETAILER_PCT_OF_CHANGE   pct_cum
0  KST                0.582561  0.582561
1  ISL                0.290904  0.873465
2  BOV                0.254456  1.127921
3  BRH                0.108240  1.236161
4  GNT                0.091363  1.327524
5  DSH                0.023441  1.350965
6  RDM               -0.011967  1.338999
7  JBL               -0.034889  1.304109
8  JBD               -0.071883  1.232226
9  WEG               -0.232227  0.999999

And now, to print rows which totally include 80 % of change, ending on the first row above the limit, run:

df[df.pct_cum.shift(1).fillna(0) < 0.8]

The result, together with the cumulated sum, is:

   ORG  RETAILER_PCT_OF_CHANGE   pct_cum
0  KST                0.582561  0.582561
1  ISL                0.290904  0.873465

Upvotes: 1

ansev
ansev

Reputation: 30930

Use:

df_filtered = df.loc[df['PCT'].shift(fill_value=0).cumsum().le(0.80),:]
#if you don't want include where cumsum is greater than 0,80
#df_filtered = df.loc[df['PCT'].cumsum().le(0.80),:] 
print(df_filtered)
   ORG       PCT
0  KST  0.582561
1  ISL  0.290904

Upvotes: 2

Jon Behnken
Jon Behnken

Reputation: 560

Can you use this example to help you?

import pandas as pd

retail_pareto = 0
orgs = []
for i,row in retailerDF.iterrows():
    if retail_pareto <= .80:
        retail_pareto += row['RETAILER_PCT_OF_CHANGE']
        orgs.append(row)
    else:
        break

new_df = pd.DataFrame(orgs)

Edit: made it more like your example and added the new DataFrame.

Upvotes: 1

Related Questions