Robert King
Robert King

Reputation: 994

Is there a concise way to produce this dataframe mask?

I have a pandas DataFrame with a model score and order_amount_bucket. There are 8 bins in the order amount bucket and I have a different threshold for each bin. I want to filter the frame and produce a boolean mask showing which rows pass.

I can do this by exhaustively listing the conditions but I feel like there must be a more pythonic way to do this.

A small example of how I have made this work so far (with only 3 bins for simplicity).

import pandas as pd

sc = 'score'
amt = 'order_amount_bucket'
example_data = {sc:[0.5, 0.8, 0.99, 0.95, 0.8,0.8],
                amt: [1, 2, 2, 2, 3, 1]}

thresholds = [0.7, 0.8, 0.9]

df = pd.DataFrame(example_data)

# the exhaustive method to create the pass mask
# is there a better way to do this part?
pass_mask = (((df[amt]==1) & (df[sc]<thresholds[0]))
            |((df[amt]==2) & (df[sc]<thresholds[1]))
            |((df[amt]==3) & (df[sc]<thresholds[2]))
            )

pass_mask.values
>> array([ True, False, False, False,  True, False])

Upvotes: 0

Views: 80

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

You could covert thresholds to a dict and use Series.map:

d = dict(enumerate(thresholds, 1))

# d: {1: 0.7, 2: 0.8, 3: 0.9}

pass_mark = df['order_amount_bucket'].map(d) > df['score']

[out]

print(pass_mark.values)
array([ True, False, False, False,  True, False])

Upvotes: 2

Related Questions