Reputation: 465
I have a dataset that contains a column 'y' in which a particular values exist. I would like to take that column and make a new column (z) denoting if y value is 47472 then z should be 1000, if y <1000 then z=y*2, else all other values should be 2000. Here's a mock example of the data. I don't have a 'z' column, but I want to create it:
y z
0 1751 2000
1 800 1600
2 10000 2000
3 350 700
4 750 1500
5 1750 3500
6 30000 2000
7 47472 1000
def test(y):
if y == 47472:
z=1000
elif y < 1000:
z=y*2
else:
z=2000
return Z
# I tried to call the above function below
z = test(y)
z
but I don't get the result instead it shows below error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 2
Views: 6276
Reputation: 6276
The problem is that you are using a Series in the if statement, such as:
if y == 47472:
assuming that y
is part of your DataFrame this will result in a list of booleans:
>>> df['y']==47472
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
Name: y, dtype: bool
Which is not legal, and therefore it suggests you to use a boolean function that returns one boolean such as any()
, all()
, etc.
Instead you should use boolean indexing:
# df is the dataframe with your data
# adding column z
df['z'] = pd.Series(np.zeros(df.shape[0]))
# if y == 47472 then put 1000
df.loc[df['y']==47472, 'z'] = 1000
# filter <1000
df.loc[df['y']<1000, 'z'] = 2*df['y']
# now set rest to 2000 (i.e. ones that do not comply previous 2 conditions)
df.loc[(df['y']>=1000) & (df['y']!=47472),'z'] = 2000
Edit: As commented by EdChum I was performing chained indexing:
df['z'][df['y']<1000] = 2*df['y']
which should be avoided by using loc
:
df.loc[df['y']<1000, 'z'] = 2*df['y']
Upvotes: 1