Lostsoul
Lostsoul

Reputation: 25999

Setting column to true/false based on comparison of two other columns in pandas?

I have the following dataframe and I want to compare column value and predicted, if they match then I want to set the value of a column "provided" to False. I'm having difficulty doing this.

Here's my data:

    ticker  periodDate  value   predicted
0   ibm     2017    150079.080  150079.080
1   ibm     2016    49799.140   49799.140
2   ibm     2015    459.016     45949.016

I want a new column to just have a True/False if value and predicted match. I tried this but to no avail:

def provideOrPredicted(df):
  if df['value'] == df['predicted']:
    df['provided'] = False
  elif df['value'] != df['predicted']:
    df['provided'] = False
  print(df)

provideOrPredicted(MergedDF)

I get this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 5

Views: 7518

Answers (2)

smci
smci

Reputation: 33938

Because the result of your ==/!= comparisons is vectorized. (or equivalently df['value'].ne(df['predicted'])

But the base-Python if command knows nothing about pandas and numpy, so it can't handle vectors (only scalars like 'True' and 'False').

So do the (vectorized) assignment directly in pandas, without any if-statement:

df['provided'] = df['value'].ne(df['predicted'])

Upvotes: -2

ozturkib
ozturkib

Reputation: 1643

Basically, below line will check each row and boolean result will be assigned into the new column of provided as:

 df['provided'] = df['value'] == df['predicted']

Upvotes: 6

Related Questions