Pandas - Conditional Calculation Based on Shift Values from Two Other Columns

I'm sure this question is easy, but it has been stumping me for too long, so would REALLY appreciate some direction

I'm looking to add a column to a dataframe based on the results of two other columns

I want to identify if the stock is equal to the stock in the prior row and the date is equal to the date in the prior row.

I am looking to get the running count I tried something along the lines of the following

df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() ,  1, 0)

and

df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() &    df['trade_date']==df['trade_date'].shift(),1,0)

Sample input

Stock, Date, Time, Price 
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92

And output:

Stock, Date,Time, Price, DayCount 
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2

I got errors like

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

And then applying a cumulative count.

First, and this is most important to me, how do you write the initial statement so that you can do the compare over multiple columns

Second, how would you add in the cumulative count?

Thank you so much for helping

Expanding on the original post, here is another question.. Assume now that the data set is slightly different

Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer

And we are looking to see how many times in a row stocks traded on the bid or offer, so the output would be:

Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1 
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2

Groupings are effectively Stock and Date, time is just used to determine sequence.. any help much appreciated on this expansion

Upvotes: 3

Views: 1535

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210922

UPDATE3: "And we are looking to see how many times in a row stocks traded on the bid or offer"

In [112]: g = df.groupby(['Stock','Date'])

In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1

In [114]: df
Out[114]:
  Stock       Date      Time   Price BidOffer  Count
0   IBM 2014-09-01  12:30:01   50.50      bid      1
1   IBM 2014-09-01  12:30:02   50.70    offer      1
2   IBM 2014-09-01  12:30:03   50.90      bid      1
3   IBM 2014-09-02  09:57:02   52.70      bid      1
4   IBM 2014-09-02  09:57:03   52.90      bid      2
5  AAPL 2014-11-02  09:57:02  520.31    offer      1
6  AAPL 2014-11-02  09:57:03  520.92    offer      2

UPDATE2:

In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1

In [516]: df
Out[516]:
  Stock       Date      Time   Price BidOffer  DayCount
0   IBM 2014-09-01  12:30:01   50.50      bid         1
1   IBM 2014-09-01  12:30:02   50.70    offer         1
2   IBM 2014-09-01  12:30:03   50.90      bid         2
3   IBM 2014-09-02  09:57:02   52.70      bid         1
4   IBM 2014-09-02  09:57:03   52.90      bid         2
5  AAPL 2014-11-02  09:57:02  520.31    offer         1
6  AAPL 2014-11-02  09:57:03  520.92    offer         2

UPDATE:

In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1

In [490]: df
Out[490]:
  Stock            Datetime   Price  DayCount
0   IBM 2014-09-01 12:30:01   50.50         1
1   IBM 2014-09-01 12:30:02   50.70         2
2   IBM 2014-09-01 12:30:03   50.90         3
3   IBM 2014-09-02 09:57:02   52.70         1
4   IBM 2014-09-02 09:57:03   52.90         2
5  AAPL 2014-11-02 09:57:02  520.31         1
6  AAPL 2014-11-02 09:57:03  520.92         2

Answer for the original question:

df['DayCount']=np.where(
                  (df['ticker']==df['ticker'].shift())
                  &
                  (df['trade_date']==df['trade_date'].shift()),
                  1,
                  0
)

The only thing that was missing in your second solution is parenthesis: np.where( (...) & (...), 1, 0)

Upvotes: 1

Related Questions