Reputation: 1960
I'm sure this question is easy, but it has been stumping me for too long, so would REALLY appreciate some direction
I'm looking to add a column to a dataframe based on the results of two other columns
I want to identify if the stock is equal to the stock in the prior row and the date is equal to the date in the prior row.
I am looking to get the running count I tried something along the lines of the following
df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() , 1, 0)
and
df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() & df['trade_date']==df['trade_date'].shift(),1,0)
Sample input
Stock, Date, Time, Price
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92
And output:
Stock, Date,Time, Price, DayCount
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2
I got errors like
TypeError: unsupported operand type(s) for &: 'str' and 'bool'
And then applying a cumulative count
.
First, and this is most important to me, how do you write the initial statement so that you can do the compare over multiple columns
Second, how would you add in the cumulative count
?
Thank you so much for helping
Expanding on the original post, here is another question.. Assume now that the data set is slightly different
Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer
And we are looking to see how many times in a row stocks traded on the bid or offer, so the output would be:
Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2
Groupings are effectively Stock and Date, time is just used to determine sequence.. any help much appreciated on this expansion
Upvotes: 3
Views: 1535
Reputation: 210922
UPDATE3: "And we are looking to see how many times in a row stocks traded on the bid or offer"
In [112]: g = df.groupby(['Stock','Date'])
In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1
In [114]: df
Out[114]:
Stock Date Time Price BidOffer Count
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 1
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2
UPDATE2:
In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1
In [516]: df
Out[516]:
Stock Date Time Price BidOffer DayCount
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 2
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2
UPDATE:
In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1
In [490]: df
Out[490]:
Stock Datetime Price DayCount
0 IBM 2014-09-01 12:30:01 50.50 1
1 IBM 2014-09-01 12:30:02 50.70 2
2 IBM 2014-09-01 12:30:03 50.90 3
3 IBM 2014-09-02 09:57:02 52.70 1
4 IBM 2014-09-02 09:57:03 52.90 2
5 AAPL 2014-11-02 09:57:02 520.31 1
6 AAPL 2014-11-02 09:57:03 520.92 2
Answer for the original question:
df['DayCount']=np.where(
(df['ticker']==df['ticker'].shift())
&
(df['trade_date']==df['trade_date'].shift()),
1,
0
)
The only thing that was missing in your second solution is parenthesis: np.where( (...) & (...), 1, 0)
Upvotes: 1