Reputation: 2170
I am trying to create a new column in a pandas dataframe using a function that takes two columns as arguments
def ipf_cat(var, con):
if var == "Idiopathic pulmonary fibrosis":
if con in range(95,100):
result = 4
if con in range(70,95):
result = 3
if con in range(50,70):
result = 2
if con in range(0,50):
result = 1
return result
And then
df['ipf_category'] = ipf_cat(df['dx1'], df['dxcon1'])
Where df['dx1'] is one column and a string and df['dxcon1'] is another column and an integer from 0-100. The function works fine in python but I keep getting this error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have seen previous answers such as
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
but I can't implement these solutions to my particular function.
Upvotes: 0
Views: 2181
Reputation: 210812
I'd use pd.cut() method:
Source DF
In [157]: df
Out[157]:
con var
0 53 ???
1 97 Idiopathic pulmonary fibrosis
2 75 ???
3 11 Idiopathic pulmonary fibrosis
4 70 ???
5 52 Idiopathic pulmonary fibrosis
6 74 ???
7 25 Idiopathic pulmonary fibrosis
8 92 ???
9 80 Idiopathic pulmonary fibrosis
Solution:
In [158]: df['ipf_category'] = -999
...:
...: bins = [-1, 50, 70, 95, 101]
...: labels = [1,2,3,4]
...:
...: df.loc[df['var']=='Idiopathic pulmonary fibrosis', 'ipf_category'] = \
...: pd.cut(df['con'], bins=bins, labels=labels)
...:
In [159]: df
Out[159]:
con var ipf_category
0 53 ??? -999
1 97 Idiopathic pulmonary fibrosis 4
2 75 ??? -999
3 11 Idiopathic pulmonary fibrosis 1
4 70 ??? -999
5 52 Idiopathic pulmonary fibrosis 2
6 74 ??? -999
7 25 Idiopathic pulmonary fibrosis 1
8 92 ??? -999
9 80 Idiopathic pulmonary fibrosis 3
Setup:
df = pd.DataFrame({
'con':np.random.randint(100, size=10),
'var':np.random.choice(['Idiopathic pulmonary fibrosis','???'], 10)
})
Upvotes: 1