a.n.
a.n.

Reputation: 97

Calculation between rows in different dates in pandas

I have experience in python but im new in pandas, after all day trying to do this i cant solve it, i try a lot of differents methods but anyone works for me. I have a database like this:

    open    low   high  close  volume  interest                                                           
2008-08-05 10:00:00  22.53  22.12  22.53  22.14     186       186
2008-08-05 11:00:00  22.20  22.05  22.11  22.10     584       398
2008-08-05 12:00:00  22.19  22.04  22.08  22.15     848       264
2008-08-05 13:00:00  22.20  22.06  22.13  22.13     928        80
2008-08-05 14:00:00  22.19  22.03  22.13  22.08     988        60
2008-08-05 15:00:00  22.37  21.72  22.20  21.77    1418       430
2008-08-05 16:00:00  21.80  21.47  21.77  21.47    1666       248
2008-08-05 17:00:00  21.50  21.37  21.43  21.40    1865       199
.....

and im trying to make a substraction between close column of index.hour==17 and close column of index.hour==6, for each day.

If i try something like:

df[df.index.hour==17].close.sub(df[df.index.hour==6].open)

The result is NaN, if i try to make a new DataFrame, the result is Nan too. How i can do that? Maybe i dont understand well pandas, and is no so easy to do this and i need to make a complete script? Thanks

Upvotes: 0

Views: 92

Answers (2)

jrovegno
jrovegno

Reputation: 719

If you dataframe is well defined (You don't need to check that every day contains 6 and 17 o'clock data) you can use pandas.DataFrame.diff and query:

    >>> df['hour'] = df.index.hour
    >>> df.query('hour == 6 | hour == 17').diff()

                     open   low  high  close  volume  interest  hour
datetime                                                            
2008-08-05 06:00:00   NaN   NaN   NaN    NaN     NaN       NaN   NaN
2008-08-05 17:00:00 -1.03 -0.75  -1.1  -0.74    1679        13    11
     >>> df.query('hour == 6 | hour == 17').diff(-1)

                     open   low  high  close  volume  interest  hour
datetime                                                            
2008-08-05 06:00:00  1.03  0.75   1.1   0.74   -1679       -13   -11
2008-08-05 17:00:00   NaN   NaN   NaN    NaN     NaN       NaN   NaN

Upvotes: 1

CT Zhu
CT Zhu

Reputation: 54400

Might not be the most elegant way to do this, but let's see:

In [54]:
#changed your first observation to 6:00
print df

                      open    low   high  close  volume  interest
2008-08-05 06:00:00  22.53  22.12  22.53  22.14     186       186
2008-08-05 11:00:00  22.20  22.05  22.11  22.10     584       398
2008-08-05 12:00:00  22.19  22.04  22.08  22.15     848       264
2008-08-05 13:00:00  22.20  22.06  22.13  22.13     928        80
2008-08-05 14:00:00  22.19  22.03  22.13  22.08     988        60
2008-08-05 15:00:00  22.37  21.72  22.20  21.77    1418       430
2008-08-05 16:00:00  21.80  21.47  21.77  21.47    1666       248
2008-08-05 17:00:00  21.50  21.37  21.43  21.40    1865       199

In [55]:

df[np.in1d(df.index.hour, [6, 17])].groupby(pd.TimeGrouper(freq='24h')).close.agg(lambda x: x[-1]-x[0])

Out[55]:

2008-08-05    0.74
Freq: 24H, Name: close, dtype: float64

The idea is to first slice the data frame, such that it only contains 6 and 17 o'clock data. Then we groupby day, and calculate the difference between 17 and 6 o'clock closing price, FOR each day.

If you are interested in the difference between the last close and the first open:

In [96]:

grp = df.ix[np.in1d(df.index.hour, [6, 17]),

            ['open','close']].groupby(pd.TimeGrouper(freq='24h'))

grp.close.last()-grp.open.last()

Out[96]:

2008-08-05   -0.1
Freq: 24H, dtype: float64

Upvotes: 1

Related Questions