Pandas, creating a column based on ranges

I am trying to create a new column based on the condition of another column, with ranges of that count. However, I am getting a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I am trying to use the same column twice to make the range, but it does not work. Where is the problem?

df.loc[(df["count_words"] > 100 & df["count_words"] <= 300),  "length"] = "keskipitkä"
df.loc[df["count_words"] <= 100, "lenght"] = "lyhyt"
df.loc[df["count_words"] > 300,  "length"] = "pitkä"

Upvotes: 1

Views: 101

Answers (1)

jezrael
jezrael

Reputation: 862406

Problem is in (), because priority of operators:

df.loc[(df["count_words"] > 100) & (df["count_words"] <= 300),  "length"] = "keskipitkä"

Another idea is here use cut:

df=pd.DataFrame({'count_words':[10, 100, 200, 300, 4999]})


df["lenght"] = pd.cut(df["count_words"], 
                         bins= [-np.inf, 100, 300, np.inf], 
                         labels=['lyhyt','keskipitkä','pitkä'])
print (df)
   count_words      lenght
0           10       lyhyt
1          100       lyhyt
2          200  keskipitkä
3          300  keskipitkä
4         4999       pitkä

Upvotes: 1

Related Questions