TomP
TomP

Reputation: 11

Python: How to populate new Pandas dataframe columns with data from existing columns

Can someone help me out? I am getting

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() 

from the following code:

import pandas as pd

testdf = pd.read_csv('../../IBM.csv')

print testdf
print "------------"
testdf['NHigh'] = 0
print testdf

if testdf['Close'] > testdf['Open']:
    testdf['Nhigh'] = testdf['Close'] * testdf['High']

print "********"
print tested

What I am trying to do is create a new column populated by values from two existing columns but only if a condition is true.

The shape is a stock dataframe with the following columns - Open, High, Low, Close etc and I want to add a new column (NHigh) based on an operation between say Close and High if Close is > than High for that row.

Thanks if you can help....

Upvotes: 1

Views: 105

Answers (1)

jezrael
jezrael

Reputation: 862661

I think you can use loc and fillna:

print testdf
                       Open    High     Low   Close  Volume
Date_Time                                                  
1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5     505
1997-02-03 09:27:00  3043.5  3043.5  3043.0  3043.0      56
1997-02-03 09:28:00  3043.0  3044.0  3043.0  3044.0      32
1997-02-03 09:29:00  3044.5  3044.5  3044.5  3044.5      63
1997-02-03 09:30:00  3045.0  3045.0  3045.0  3045.0      28
1997-02-03 09:31:00  3045.0  3045.5  3045.0  3045.5      75

print testdf['Close'] > testdf['Open']            
Date_Time
1997-02-03 09:04:00     True
1997-02-03 09:27:00    False
1997-02-03 09:28:00     True
1997-02-03 09:29:00    False
1997-02-03 09:30:00    False
1997-02-03 09:31:00     True
dtype: bool

testdf.loc[testdf['Close'] > testdf['Open'],'Nhigh'] = testdf['Close'] * testdf['High']
testdf['Nhigh'] = testdf['Nhigh'].fillna(0)
print testdf
                       Open    High     Low   Close  Volume       Nhigh
Date_Time                                                              
1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5     505  9290303.75
1997-02-03 09:27:00  3043.5  3043.5  3043.0  3043.0      56        0.00
1997-02-03 09:28:00  3043.0  3044.0  3043.0  3044.0      32  9265936.00
1997-02-03 09:29:00  3044.5  3044.5  3044.5  3044.5      63        0.00
1997-02-03 09:30:00  3045.0  3045.0  3045.0  3045.0      28        0.00
1997-02-03 09:31:00  3045.0  3045.5  3045.0  3045.5      75  9275070.25

Other solution use numpy.where:

testdf['Nhigh']=np.where(testdf['Close'] > testdf['Open'], testdf['Close']*testdf['High'], 0)
print testdf
                       Open    High     Low   Close  Volume       Nhigh
Date_Time                                                              
1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5     505  9290303.75
1997-02-03 09:27:00  3043.5  3043.5  3043.0  3043.0      56        0.00
1997-02-03 09:28:00  3043.0  3044.0  3043.0  3044.0      32  9265936.00
1997-02-03 09:29:00  3044.5  3044.5  3044.5  3044.5      63        0.00
1997-02-03 09:30:00  3045.0  3045.0  3045.0  3045.0      28        0.00
1997-02-03 09:31:00  3045.0  3045.5  3045.0  3045.5      75  9275070.25

Upvotes: 1

Related Questions