Reputation: 97
I have experience in python but im new in pandas, after all day trying to do this i cant solve it, i try a lot of differents methods but anyone works for me. I have a database like this:
open low high close volume interest
2008-08-05 10:00:00 22.53 22.12 22.53 22.14 186 186
2008-08-05 11:00:00 22.20 22.05 22.11 22.10 584 398
2008-08-05 12:00:00 22.19 22.04 22.08 22.15 848 264
2008-08-05 13:00:00 22.20 22.06 22.13 22.13 928 80
2008-08-05 14:00:00 22.19 22.03 22.13 22.08 988 60
2008-08-05 15:00:00 22.37 21.72 22.20 21.77 1418 430
2008-08-05 16:00:00 21.80 21.47 21.77 21.47 1666 248
2008-08-05 17:00:00 21.50 21.37 21.43 21.40 1865 199
.....
and im trying to make a substraction between close column of index.hour==17 and close column of index.hour==6, for each day.
If i try something like:
df[df.index.hour==17].close.sub(df[df.index.hour==6].open)
The result is NaN, if i try to make a new DataFrame, the result is Nan too. How i can do that? Maybe i dont understand well pandas, and is no so easy to do this and i need to make a complete script? Thanks
Upvotes: 0
Views: 92
Reputation: 719
If you dataframe is well defined (You don't need to check that every day contains 6 and 17 o'clock data) you can use pandas.DataFrame.diff and query:
>>> df['hour'] = df.index.hour
>>> df.query('hour == 6 | hour == 17').diff()
open low high close volume interest hour
datetime
2008-08-05 06:00:00 NaN NaN NaN NaN NaN NaN NaN
2008-08-05 17:00:00 -1.03 -0.75 -1.1 -0.74 1679 13 11
>>> df.query('hour == 6 | hour == 17').diff(-1)
open low high close volume interest hour
datetime
2008-08-05 06:00:00 1.03 0.75 1.1 0.74 -1679 -13 -11
2008-08-05 17:00:00 NaN NaN NaN NaN NaN NaN NaN
Upvotes: 1
Reputation: 54400
Might not be the most elegant way to do this, but let's see:
In [54]:
#changed your first observation to 6:00
print df
open low high close volume interest
2008-08-05 06:00:00 22.53 22.12 22.53 22.14 186 186
2008-08-05 11:00:00 22.20 22.05 22.11 22.10 584 398
2008-08-05 12:00:00 22.19 22.04 22.08 22.15 848 264
2008-08-05 13:00:00 22.20 22.06 22.13 22.13 928 80
2008-08-05 14:00:00 22.19 22.03 22.13 22.08 988 60
2008-08-05 15:00:00 22.37 21.72 22.20 21.77 1418 430
2008-08-05 16:00:00 21.80 21.47 21.77 21.47 1666 248
2008-08-05 17:00:00 21.50 21.37 21.43 21.40 1865 199
In [55]:
df[np.in1d(df.index.hour, [6, 17])].groupby(pd.TimeGrouper(freq='24h')).close.agg(lambda x: x[-1]-x[0])
Out[55]:
2008-08-05 0.74
Freq: 24H, Name: close, dtype: float64
The idea is to first slice the data frame, such that it only contains 6 and 17 o'clock data. Then we groupby day, and calculate the difference between 17 and 6 o'clock closing price, FOR each day.
If you are interested in the difference between the last close and the first open:
In [96]:
grp = df.ix[np.in1d(df.index.hour, [6, 17]),
['open','close']].groupby(pd.TimeGrouper(freq='24h'))
grp.close.last()-grp.open.last()
Out[96]:
2008-08-05 -0.1
Freq: 24H, dtype: float64
Upvotes: 1