Amistad
Amistad

Reputation: 7400

Adding values to a new column in Pandas depending on values in an existing column

I have a pandas dataframe as follows:

    Name  Age       City    Country  percentage
a   Jack   34     Sydney  Australia        0.23
b   Riti   30      Delhi      India        0.45
c  Vikas   31     Mumbai      India        0.55
d  Neelu   32  Bangalore      India        0.73
e   John   16   New York         US        0.91
f   Mike   17  las vegas         US        0.78

I am planning to add one more column called bucket whose definition depends on the percentage column as follows:

less than 0.25 = 1 
between 0.25 and 0.5 = 2
between 0.5 and 0.75 = 3
greater than 0.75 = 4

I tried the inbuilt conditions and choices properties of pandas follows:

conditions = [(df_obj['percentage'] < .25),
              (df_obj['percentage'] >=.25 & df_obj['percentage'] < .5),
              (df_obj['percentage'] >=.5 & df_obj['percentage'] < .75),
              (df_obj['percentage'] >= .75)]
choices = [1,2,3,4]
df_obj['bucket'] = np.select(conditions, choices)

However, this gives me a random error as follows in the line where I create the conditions:

TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

Upvotes: 0

Views: 105

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

A quick fix to your code is that you need more parentheses, for example:

((df_obj['percentage'] >=.25) & (df_obj['percentage'] < .5) )
 ^                          ^   ^                         ^

However, I think it's cleaner with pd.cut:

pd.cut(df['percentage'], bins=[0,0.25, 0.5, 0.75, 1],
       include_lowest=True, right=False,
       labels=[1,2,3,4])

Or since your buckets are linear:

df['bucket'] = (df['percentage']//0.25).add(1).astype(int)

Output

    Name  Age       City    Country  percentage  bucket
a   Jack   34     Sydney  Australia        0.23       1
b   Riti   30      Delhi      India        0.45       2
c  Vikas   31     Mumbai      India        0.55       3
d  Neelu   32  Bangalore      India        0.73       3
e   John   16   New York         US        0.91       4
f   Mike   17  las vegas         US        0.78       4

Upvotes: 2

Danilo Filippo
Danilo Filippo

Reputation: 972

I think the easiest/most readable way to do this is to use the apply function:

def percentage_to_bucket(percentage):
    if percentage < .25:
        return 1
    elif percentage >= .25 and percentage < .5:
        return 2
    elif percentage >= .5 and percentage < .75:
        return 3
    else:
        return 4

df["bucket"] = df["percentage"].apply(percentage_to_bucket)

Pandas apply will take each value of a given column and apply the passed function to this value, returning a pandas series with the results, which you can then assign to your new column.

Upvotes: 2

Related Questions