Reputation: 7273
I am trying to create a new column where the values filled will be after comparing the two columns of the dataframe. Here is what I tried:
def determinecolor(row,column1,column2):
if row[column1] == row[column2]:
val = 'k'
elif row[column1] > row[column2]:
val = 'r'
else:
val = 'g'
return val
datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
The error I received:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-182-31188e414958> in <module>()
2 # if(test_shifted['openshifted'][0] > test_pred_list[0]): print("red")
3 datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
----> 4 datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
5
6 # datasetTest['color_predicted'] = datasetTest.apply(determinePredictedcolor, axis=1)
<ipython-input-178-d1f3e204fd17> in determinecolor(row, column1, column2)
1 def determinecolor(row,column1,column2):
----> 2 if row[column1] == row[column2]:
3 val = 'k'
4 elif row[column1] > row[column2]:
5 val = 'r'
c:\python35\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1119 raise ValueError("The truth value of a {0} is ambiguous. "
1120 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121 .format(self.__class__.__name__))
1122
1123 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Kindly, help me solve this issue.
EDITED
Here is a sample dataset:
open high low close closeTarget openshifted predict_close
0.104167 0.119048 0.117647 0.145833 0.104167 0.416667 0.881613
0.416667 0.285714 0 0.104167 0.4375 0.833333 0.684905
0.833333 0.761905 0.45098 0.4375 0.791667 0.8125 0.821244
0.8125 0.761905 0.784314 0.791667 0.770833 0.8125 0.920608
0.8125 0.761905 0.823529 0.770833 0.8125 0.916667 0.853668
Upvotes: 1
Views: 161
Reputation: 323226
Two np.where
chain
x=df['openshifted']-df['close']
np.where(x>0,'r',np.where(x==0,'k','g'))
Upvotes: 1
Reputation: 164623
You should not use pd.DataFrame.apply
for vectorisable operations.
You can use numpy.select
instead to supply a list of conditions and values, along with a default value for all other scenarios:
conditions = [df['col1'] == df['col2'], df['col1'] > df['col2']]
values = ['k', 'r']
df['color_original'] = np.select(conditions, values, 'g')
The reason for your error is you are misusing pd.DataFrame.apply
, which passes each row to a function (with axis=1
). You don't need to pass the dataframe explicitly as an argument:
df['color_original'] = df.apply(determinecolor, column1='openshifted',
column2='close', axis=1)
Upvotes: 5