sre
sre

Reputation: 279

Group by of a float column using pandas

I need to group by pandas with weight

Name      weight(kg)
Person1    4.44
Person2   37.3
Person3   36.38
Person4   39.52
Person5   81.57
Person6   43.55
Person7   91.11
Person8    5
Person9   36.48
Person10  38.25

My code is below. Need to group by according to if condition. My code is below. But I am getting 0 to 20 for all the rows.

if 0 <= data_file['weight(kg)'].all() < 20:
    data_file['target'] = "0 to 20%"
if 20 < data_file['weight(kg)'].all() < 40:
    data_file['target'] = "20 to 40%"
if 40 < data_file['weight(kg)'].all() < 60:
    data_file['target'] = "40 to 60%"
if 60 < data_file['weight(kg)'].all() < 80:
    data_file['target'] = "60 to 80%"
if 80 < data_file['weight(kg)'].all() <= 100:
    data_file['target'] = "80 to 100%"

Expected out

Name     weight(kg) Target
Person1  4.44       0 to 20
Person2  37.3       20 to 40
Person3  36.38      20 to 40
Person4  39.52      20 to 40
Person5  81.57      80 to 100
Person6  43.55      40 to 60
Person7  91.11      80 to 100
Person8  5           0 to 20
Person9  36.48      20 to 40
Person10 38.25      20 to 40

Upvotes: 2

Views: 1672

Answers (3)

Terry
Terry

Reputation: 2811

You can do it using np.select

conditions = [
        (0 <= df['weight(kg)']) & (df['weight(kg)'] < 20)
     ,  (20 < df['weight(kg)']) & (df['weight(kg)'] < 40)
     ,  (40 < df['weight(kg)']) & (df['weight(kg)'] < 60)
     ,  (60 < df['weight(kg)']) & (df['weight(kg)'] < 80)
     ,  (80 < df['weight(kg)']) & (df['weight(kg)'] <= 100)
]

results = [
    "0 to 20%"
    ,"20 to 40%"
    ,"40 to 60%"
    ,"60 to 80%"
    ,"80 to 100%"
]

df['Target'] = np.select(conditions, results)

output:

    Name    weight(kg)  Target
0   Person1 4.44        0 to 20%
1   Person2 37.30       20 to 40%
2   Person3 36.38       20 to 40%
3   Person4 39.52       20 to 40%
4   Person5 81.57       80 to 100%

Upvotes: 3

Manuel
Manuel

Reputation: 113

Very simple, just try using apply in pandas and lambda function:

def classify(x):
    if 0 <= x < 20:
        y = "0 to 20%"
    if 20 < x < 40:
        y = "20 to 40%"
    if 40 < x < 60:
        y = "40 to 60%"
    if 60 < x < 80:
        y = "60 to 80%"
    if 80 < x <= 100:
        y = "80 to 100%"
    return y

the assuming your Dataframe has two columns 'Name' and 'Weight' we should type:

df['Target'] = df['weight'].apply(lambda x: classify(x))

I hope it helps

Extra: if you want progress bar you can add these lines:

from tqdm import tqdm
tqdm.pandas()
df['Target'] = df['weight'].progress_apply(lambda x: classify(x))

Upvotes: 3

pythonic833
pythonic833

Reputation: 3224

Use pd.cut

df.assign(bins = pd.cut(df["weight(kg)"], [0,20,40,60,80,100], 
                        labels=['0 to 20', '20 to 40', '40 to 60', '60 to 80', '80 to 100']))

result

      Name   weight(kg) bins
0   Person1     4.44    0 to 20
1   Person2     37.30   20 to 40
2   Person3     36.38   20 to 40
3   Person4     39.52   20 to 40
4   Person5     81.57   80 to 100
5   Person6     43.55   40 to 60
6   Person7     91.11   80 to 100
7   Person8     5.00    0 to 20
8   Person9     36.48   20 to 40
9   Person10    38.25   20 to 40

Upvotes: 6

Related Questions