29Clyde
29Clyde

Reputation: 33

Apply if/else logic to dataframe in function: ValueError: The truth value of a Series is ambiguous

I want to create a new column in a dataframe based on if/then logic. The rules for the actual problem are the output of a CART tree so fairly complex. The problem that I have is that when I try to apply the function to my dataframe, I get the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I am pretty sure that this is because the 'if' logic is trying to evaluate the input as a series as opposed to on a row by row basis. I just can't figure out the solution.

To replicate:

import pandas as pd
import numpy as np
np.random.seed(1)

#create sample dataframe
df_test = pd.DataFrame({"llflag": np.random.normal(0,1,100)})

#sample if/else logic
def tree1(df):
  if df['llflag'] <= 0.5:
      return 4
  else:  
      return 3
  return 

#attempt to apply function to df
df_test['testRR'] = df_test.apply(tree1(df_test ), axis = 1)

I got the same results with.

df_test['testRR'] = df_test.apply(lambda  x: tree1( df_test), axis = 1)'''

what am I missing? Thanks in advance.

Upvotes: 0

Views: 69

Answers (2)

Tom
Tom

Reputation: 8790

You want to apply the function for each row, not apply the function evaluated on df_test (which fails), so remove the parentheses:

df_test['testRR'] = df_test.apply(tree1, axis = 1)

Also trying to discourage using apply, so here's a different faster version:

df_test['testRR'] = np.where(df_test['llflag'] <= 0.5, 4, 3)

Or a list comp version (also faster):

def tree2(row):
    return 4 if row <=0.5 else 3

df_test['testRR'] = [tree2(row) for row in df_test["llflag"]]

Upvotes: 3

Jo&#227;o Vicente
Jo&#227;o Vicente

Reputation: 72

Remove the (df_test)

df_test['testRR'] = df_test.apply(tree1(df_test ), axis = 1)

This will apply the function for each row

Upvotes: 1

Related Questions