Reputation: 33
I want to create a new column in a dataframe based on if/then logic. The rules for the actual problem are the output of a CART tree so fairly complex. The problem that I have is that when I try to apply the function to my dataframe, I get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I am pretty sure that this is because the 'if' logic is trying to evaluate the input as a series as opposed to on a row by row basis. I just can't figure out the solution.
To replicate:
import pandas as pd
import numpy as np
np.random.seed(1)
#create sample dataframe
df_test = pd.DataFrame({"llflag": np.random.normal(0,1,100)})
#sample if/else logic
def tree1(df):
if df['llflag'] <= 0.5:
return 4
else:
return 3
return
#attempt to apply function to df
df_test['testRR'] = df_test.apply(tree1(df_test ), axis = 1)
I got the same results with.
df_test['testRR'] = df_test.apply(lambda x: tree1( df_test), axis = 1)'''
what am I missing? Thanks in advance.
Upvotes: 0
Views: 69
Reputation: 8790
You want to apply
the function for each row, not apply
the function evaluated on df_test
(which fails), so remove the parentheses:
df_test['testRR'] = df_test.apply(tree1, axis = 1)
Also trying to discourage using apply
, so here's a different faster version:
df_test['testRR'] = np.where(df_test['llflag'] <= 0.5, 4, 3)
Or a list comp version (also faster):
def tree2(row):
return 4 if row <=0.5 else 3
df_test['testRR'] = [tree2(row) for row in df_test["llflag"]]
Upvotes: 3
Reputation: 72
Remove the (df_test)
df_test['testRR'] = df_test.apply(tree1(df_test ), axis = 1)
This will apply the function for each row
Upvotes: 1