Reputation: 7
I wanna fill my dataset with condition, by using columns in dataset. You see it here
And by column "Average hour", I create column "Car type" and fill it by this function:
def sample(df, i, steps):
for i in range(steps):
if(df["Average hour"].all())<70:
df["Car type"].fillna("Mini truck").all()
elif(df["Average hour"].all()>70 and df["Average hour"].all()<90):
df["Car type"].fillna("VAN").all()
elif(df["Average hour"].all()>90 and df["Average hour"].all()<100):
df["Car type"].fillna("Bus").all()
elif(df["Average hour"].all()>100 and df["Average hour"].all()<120):
df["Car type"].fillna("SUV").all()
elif(df["Average hour"].all()>120):
df["Car type"].fillna("PickUP truck").all()
return df
When I created new column, it has full NaN values, according to this point, I used .fillna(), but terminal tells me use .all() too, but I still confused, this function isn't working. If you will advice me, write with np.where, can you explain, how I use it? May be I miss something?
Upvotes: 0
Views: 58
Reputation: 37737
Here is another way to create a categorical column. It's by using pandas.cut
.
bins = [0, 70, 90, 100, 120, float("inf")]
labels = ['Mini truck', 'VAN', 'bus', 'SUV', 'PickUP truck']
df['Car Type'] = pd.cut(df['Average hour'], bins, labels=labels)
>>> print(df)
Upvotes: 2
Reputation: 6337
Based on your image I think apply()
with a custom function f
solves it for you.
def f(x):
if x<70:
return "Mini truck"
elif 70<x<90: # because of the logic befor this cloud simplified to x<90
return "VAN"
elif 90<x<100:
return "Bus"
elif 100<x<120:
return "SUV"
elif 120<x:
return "PickUP truck"
df["Car type"] = df["Average Hour"].apply(f)
import pandas as pd
from io import StringIO
t="""Brand Average Hour
Audi 122
BWM 89
"""
df = pd.read_csv(StringIO(t), sep="\s\s", index_col=0)
df["Car type"] = df["Average Hour"].apply(f)
>>> df
Average Hour Car type
Brand
Audi 122 PickUP truck
BWM 89 VAN
Upvotes: 1