Subtraction between columns operation to create a new column in the dataframe

Question

My current dataframe:

             Adj Close    High  high_shift  high_>_high
Date                
2017-01-03   14.676315   15.65      14.70        True
2017-01-04   14.676315   15.68      15.65        True
2017-01-05   14.913031   15.91      15.68        True
2017-01-06   14.827814   15.92      15.91        True
2017-01-09   14.515349   15.60      15.92        False
2017-01-10   14.657379   15.68      15.60        True
2017-01-11   14.827814   15.68      15.68        False
2017-01-12   15.055059   16.25      15.68        True
2017-01-13   14.846750   15.95      16.25        False
2017-01-16   14.913031   15.75      15.95        False

If the value of the high column is greater than the value in the high_shift column I would like to create a new column by subtracting the values from the column rows adj close minus the row values from the high_shift * 100 columns.

Just example:

if (df.High > df.high_shift):
    df['new_column'] = (df['Adj Close'] - df['high_shift'])*100

If the value of the high column is NOT greater than the value of the high_shift column I would like the value in the new column row to be 0

I'm trying the following line of code but I'm getting error, I can't even print the result:

for i in df['high_>_high'], df['Close'], df['high_shift']:
    if df['high_>_high'][i]:
        (df['Close'][i] - df['high_shift'][i])*100

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I was able to make a column (high _> _ high_shift) showing when high > high_shift but I can't pass this as a condition to create a new column by subtracting the others

jirassimok · Accepted Answer

When dealing with numeric data in Pandas, it is usually a good idea to avoid Python loops (for/while) in favor of Pandas' vectorized functions.

In this case, you can use Series.clip, which brings values outside the given boundaries to the boundaries.

df['new_column'] = ((df['Adj Close'] - df['high_shift']) * 100).clip(0)
# (.clip(0) could also go after the inner parentheses)

Alternatively, you can clip the column in-place afterwards.

df['new_column'] = (df['Adj Close'] - df['high_shift']).clip(0) * 100
df['new_column'].clip(0, inplace=True)

For a more general case than truncate values to a range, you could use boolean indexing on a series (or a DataFrame). (If you want to learn more., that page discusses the many indexing Pandas provides.)

df['new'] = ((df['Adj Close'] - df['high_shift']) * 100).clip(0)
# Set all values in column 'new' below 0 to 0
df['new'][df['new'] < 0] = 0

Subtraction between columns operation to create a new column in the dataframe

Answers (2)

Related Questions