Jaffer Wilson
Jaffer Wilson

Reputation: 7273

Passing rows to function giving error Pandas Python

I am trying to create a new column where the values filled will be after comparing the two columns of the dataframe. Here is what I tried:

def determinecolor(row,column1,column2):
    if row[column1] == row[column2]:
        val = 'k'
    elif row[column1] > row[column2]:
        val = 'r'
    else:
        val = 'g'
    return val
datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)

The error I received:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-182-31188e414958> in <module>()
      2 # if(test_shifted['openshifted'][0] > test_pred_list[0]): print("red")
      3 datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
----> 4 datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
      5 
      6 # datasetTest['color_predicted'] = datasetTest.apply(determinePredictedcolor, axis=1)

<ipython-input-178-d1f3e204fd17> in determinecolor(row, column1, column2)
      1 def determinecolor(row,column1,column2):
----> 2     if row[column1] == row[column2]:
      3         val = 'k'
      4     elif row[column1] > row[column2]:
      5         val = 'r'

c:\python35\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1119         raise ValueError("The truth value of a {0} is ambiguous. "
   1120                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121                          .format(self.__class__.__name__))
   1122 
   1123     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Kindly, help me solve this issue.

EDITED
Here is a sample dataset:

open    high    low close   closeTarget openshifted predict_close
0.104167    0.119048    0.117647    0.145833    0.104167    0.416667    0.881613
0.416667    0.285714    0   0.104167    0.4375  0.833333    0.684905
0.833333    0.761905    0.45098 0.4375  0.791667    0.8125  0.821244
0.8125  0.761905    0.784314    0.791667    0.770833    0.8125  0.920608
0.8125  0.761905    0.823529    0.770833    0.8125  0.916667    0.853668

Upvotes: 1

Views: 161

Answers (2)

BENY
BENY

Reputation: 323226

Two np.where chain

x=df['openshifted']-df['close']
np.where(x>0,'r',np.where(x==0,'k','g'))

Upvotes: 1

jpp
jpp

Reputation: 164623

You should not use pd.DataFrame.apply for vectorisable operations.

You can use numpy.select instead to supply a list of conditions and values, along with a default value for all other scenarios:

conditions = [df['col1'] == df['col2'], df['col1'] > df['col2']]
values = ['k', 'r']

df['color_original'] = np.select(conditions, values, 'g')

The reason for your error is you are misusing pd.DataFrame.apply, which passes each row to a function (with axis=1). You don't need to pass the dataframe explicitly as an argument:

df['color_original'] = df.apply(determinecolor, column1='openshifted',
                                column2='close', axis=1)

Upvotes: 5

Related Questions