Chris90
Chris90

Reputation: 1998

Adding Calculation to a lambda or function

I have a df such as

    ID |    Status   | Color
   555    Cancelled     Green
   434    Processed     Red   
   212    Cancelled     Blue
   121    Cancelled     Green
   242    Cancelled     Blue
   352    Processed     Green
   343    Processed     Blue

I am using code such as below:

cc = df.groupby(by='Color').ID.count()
df.groupby(by=['Color', 'Status']).apply(lambda x: len(x)/cc.loc[x.Color.iloc[0]])

This gives me output such as

Color     Status   
Blue   Cancelled    0.666667
       Processed    0.333333
Green  Cancelled    0.666667
       Processed    0.333333
Red    Processed    1.000000
dtype: float64

Which gives me percentage of the status of each color.

There is also a field called dollar_value where each row contains dollar amounts, if I want to add two fields to my output of 1. Total_Dollars for that color and status and 2. the dollar_per_order for that color ( meaning if the Total_Dollars is 1000 and there are 200 rows for that color and status it would be 1000/200 or 5. Can I easily add both of these calculations to my output that I already have? Or would I need to create a function?

Desired Output:

    Color     Status              Total |Dollar_Per_Order                
    Blue   Cancelled    0.666667  1000       20
           Processed    0.333333  200        5
    Green  Cancelled    0.666667  2000       20
           Processed    0.333333  1000       5
    Red    Processed    1.000000  300        10
    dtype: float64

Thanks!

Upvotes: 1

Views: 59

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

To compute all 3 columns, define a function to be applied to each group as:

def fn(grp):
    total = grp.dollar_value.sum()
    rowNo = len(grp.index)
    return pd.Series([ rowNo/cc[grp.name[0]], total, total/rowNo ],
        index=[ 'Percentage', 'Total_Dollars', 'Dollar_per_order'])

Then apply it:

df.groupby(by=['Color', 'Status']).apply(fn)

Note that I used len(grp.index) instead of len(grp). The reason is that it runs a little quicker.

I also read the Color of the current group other way than you.

Upvotes: 1

Related Questions