Reputation: 526
My current dataframe:
Adj Close High high_shift high_>_high
Date
2017-01-03 14.676315 15.65 14.70 True
2017-01-04 14.676315 15.68 15.65 True
2017-01-05 14.913031 15.91 15.68 True
2017-01-06 14.827814 15.92 15.91 True
2017-01-09 14.515349 15.60 15.92 False
2017-01-10 14.657379 15.68 15.60 True
2017-01-11 14.827814 15.68 15.68 False
2017-01-12 15.055059 16.25 15.68 True
2017-01-13 14.846750 15.95 16.25 False
2017-01-16 14.913031 15.75 15.95 False
If the value of the high column is greater than the value in the high_shift column I would like to create a new column by subtracting the values from the column rows adj close minus the row values from the high_shift * 100 columns.
Just example:
if (df.High > df.high_shift):
df['new_column'] = (df['Adj Close'] - df['high_shift'])*100
If the value of the high column is NOT greater than the value of the high_shift column I would like the value in the new column row to be 0
I'm trying the following line of code but I'm getting error, I can't even print the result:
for i in df['high_>_high'], df['Close'], df['high_shift']:
if df['high_>_high'][i]:
(df['Close'][i] - df['high_shift'][i])*100
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I was able to make a column (high _> _ high_shift) showing when high > high_shift but I can't pass this as a condition to create a new column by subtracting the others
Upvotes: 2
Views: 650
Reputation: 61910
Use numpy.where:
df['new_column'] = np.where(df.High > df.high_shift, (df.High - df.high_shift) * 100, 0)
print(df)
Output
Date Adj Close High high_shift high_>_high new_column
0 2017-01-03 14.676315 15.65 14.70 True 95.0
1 2017-01-04 14.676315 15.68 15.65 True 3.0
2 2017-01-05 14.913031 15.91 15.68 True 23.0
3 2017-01-06 14.827814 15.92 15.91 True 1.0
4 2017-01-09 14.515349 15.60 15.92 False 0.0
5 2017-01-10 14.657379 15.68 15.60 True 8.0
6 2017-01-11 14.827814 15.68 15.68 False 0.0
7 2017-01-12 15.055059 16.25 15.68 True 57.0
8 2017-01-13 14.846750 15.95 16.25 False 0.0
9 2017-01-16 14.913031 15.75 15.95 False 0.0
Upvotes: 2
Reputation: 4263
When dealing with numeric data in Pandas, it is usually a good idea to avoid Python loops (for
/while
) in favor of Pandas' vectorized functions.
In this case, you can use Series.clip, which brings values outside the given boundaries to the boundaries.
df['new_column'] = ((df['Adj Close'] - df['high_shift']) * 100).clip(0)
# (.clip(0) could also go after the inner parentheses)
Alternatively, you can clip the column in-place afterwards.
df['new_column'] = (df['Adj Close'] - df['high_shift']).clip(0) * 100
df['new_column'].clip(0, inplace=True)
For a more general case than truncate values to a range, you could use boolean indexing on a series (or a DataFrame). (If you want to learn more., that page discusses the many indexing Pandas provides.)
df['new'] = ((df['Adj Close'] - df['high_shift']) * 100).clip(0)
# Set all values in column 'new' below 0 to 0
df['new'][df['new'] < 0] = 0
Upvotes: 0