burtonian
burtonian

Reputation: 39

How to count the number of true values in a pandas DF column based on another column?

I have a df that is similar to this:

import pandas as pd
import as np
df = pd.DataFrame({'district':['A','B','C','D','B','B','A'], 'price': [10.0,20.0,50.0,27.0,21.0,19.0,12.0]})

I am trying to develop a function to see which district offers the most accommodation opportunities, given a budget constraint.

If I passed the following function:

def district_recommender(df, min_budget = 17.0, max_budget = 30.0)

Once called, the output should be:

"For your specified budget {} - {}, the B District has 3 opportunities within your criteria.format(min_budget, max_budget)"

I have tried many different attempts but to not much success. This is what i've come up with.

def district_recommender(df, min_budget, max_budget):

    columns = df[['district', 'price']].copy()
    for i in columns.iloc[:, 0]:
      if columns.iloc[:, 1] >= min_budget and columns.iloc[:, 1] <= max_budget:
        return columns.iloc[:, 0].count().sum()

I think that i'm having a issue with counting the boolean values from the if-condition, as the error message i'm receiving is:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any suggestions would be much appreciated! Thanks

Upvotes: 0

Views: 1038

Answers (2)

azro
azro

Reputation: 54148

You may iterate on the different district values, and for each check if all prices are in the range, using Series.between

def district_recommender(df, min_budget=17.0, max_budget=30.0):
    for district in df['district'].unique():
        values = df[df['district'] == district]['price']
        if values.between(min_budget, max_budget).all():
            return f"For your specified budget [{min_budget};{max_budget}], the {district} " \
                   f"District has {len(values)} opportunities"

Upvotes: 1

Randy
Randy

Reputation: 14847

One way to get at this would be to do the filtering for price and then just take the value_counts of the resulting set, which will give you counts for every district.

In [350]: df[df['price'].between(17, 30)]['district'].value_counts()
Out[350]:
B    3
D    1
Name: district, dtype: int64

which you could then use to get your narrative output:

In [354]: vc = df[df['price'].between(17, 30)]['district'].value_counts()

In [355]: print(f"District {vc.idxmax()} has {vc.max()} options in your range.")
District B has 3 options in your range.

Upvotes: 2

Related Questions