nipy
nipy

Reputation: 5488

Select certain dates from Pandas dataframe

I am learning how to filter dates on a Pandas data frame and need some help with the following please. This is my original data frame (from this data):

data
Out[120]:
Open    High    Low Last    Volume  NumberOfTrades  BidVolume   AskVolume
Timestamp                               
2014-03-04 09:30:00 1783.50 1784.50 1783.50 1784.50 171 17  29  142
2014-03-04 09:31:00 1784.75 1785.75 1784.50 1785.25 28  21  10  18
2014-03-04 09:32:00 1785.00 1786.50 1785.00 1786.50 81  19  4   77
2014-03-04 09:33:00 1786.00 1786.00 1785.25 1785.25 41  14  8   33
2014-03-04 09:34:00 1785.00 1785.25 1784.75 1785.25 11  8   2   9
2014-03-04 09:35:00 1785.50 1786.75 1785.50 1785.75 49  27  13  36
2014-03-04 09:36:00 1786.00 1786.00 1785.25 1785.75 12  8   3   9
2014-03-04 09:37:00 1786.00 1786.25 1785.25 1785.25 15  8   10  5
2014-03-04 09:38:00 1785.50 1785.50 1784.75 1785.25 24  17  17  7

    data.dtypes
Out[118]:
Open              float64
High              float64
Low               float64
Last              float64
Volume              int64
NumberOfTrades      int64
BidVolume           int64
AskVolume           int64
dtype: object

I then resampled to 5 minute sections:

five_min = data.resample('5T').sum()

And look for the high volume days:

max_volume = five_min.Volume.at_time('9:30') > 65000

I then try to get the days high volume days as follows:

five_min.Volume = max_volume[max_volume == True]

for_high_vol = five_min.Volume.dropna()

for_high_vol

Timestamp
2014-03-21 09:30:00    True
2014-04-11 09:30:00    True
2014-04-16 09:30:00    True
2014-04-17 09:30:00    True
2014-07-18 09:30:00    True
2014-07-31 09:30:00    True
2014-09-19 09:30:00    True
2014-10-07 09:30:00    True
2014-10-10 09:30:00    True
2014-10-14 09:30:00    True
2014-10-15 09:30:00    True
2014-10-16 09:30:00    True
2014-10-17 09:30:00    True

I would like to use the index from "for_high_vol" to select all of the days from the original "data" Pandas dataframe.

Im sure there are much better was to approach this so can someone please show me the simplest way to do this?

Upvotes: 1

Views: 118

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

IIUC, you can do it this way:

x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
     &
     (x.Timestamp.dt.hour==9)
     &
     (x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)]

in order to set index back:

x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
     &
     (x.Timestamp.dt.hour==9)
     &
     (x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')

PS Timestamp is a regular column in my DF, not an index

Explanation:

resample / group our DF by 5 minutes interval, calculate the sum of Volume for each group and assign this sum to all rows in the group. For example in the example below 332 - is the sum of Volume in the first 5-min group

In [41]: (x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum')).head(10)
Out[41]:
0    332
1    332
2    332
3    332
4    332
5    113
6    113
7    113
8    113
9    113
dtype: int64

filter time - the conditions are self-explanatory:

(x.Timestamp.dt.hour==9) & (x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')

and finally combine all conditions (filters) together - pass it to .ix[] indexer and set index back to Timestamp:

x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
     &
     (x.Timestamp.dt.hour==9)
     &
     (x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')

Output:

Out[32]:
                 Timestamp     Open     High      Low     Last  Volume  NumberOfTrades  BidVolume  AskVolume
5011   2014-03-21 09:30:00  1800.75  1802.50  1800.00  1802.25   30181            6006      13449      16732
5012   2014-03-21 09:31:00  1802.50  1803.25  1802.25  1802.50   15588            3947       5782       9806
5013   2014-03-21 09:32:00  1802.50  1803.75  1802.25  1803.25   16409            3994       6867       9542
5014   2014-03-21 09:33:00  1803.00  1803.50  1802.75  1803.25   10790            3158       4781       6009
5015   2014-03-21 09:34:00  1803.25  1804.75  1803.25  1804.75   13377            3466       4690       8687
11086  2014-04-11 09:30:00  1744.75  1744.75  1743.00  1743.50   21504            5876      11178      10326
11087  2014-04-11 09:31:00  1743.50  1746.50  1743.25  1746.00   21582            6191       8830      12752
11088  2014-04-11 09:32:00  1746.00  1746.50  1744.25  1745.75   18961            5214       9521       9440
11089  2014-04-11 09:33:00  1746.00  1746.25  1744.00  1744.25   12832            3658       7219       5613
11090  2014-04-11 09:34:00  1744.25  1744.25  1742.00  1742.75   15478            4919       8912       6566
12301  2014-04-16 09:30:00  1777.50  1778.25  1776.25  1777.00   21178            5431      10775      10403
12302  2014-04-16 09:31:00  1776.75  1779.25  1776.50  1778.50   16456            4400       6351      10105
12303  2014-04-16 09:32:00  1778.50  1779.25  1777.25  1777.50    9956            3015       5810       4146
12304  2014-04-16 09:33:00  1777.50  1778.00  1776.25  1776.25    8724            2470       5326       3398
12305  2014-04-16 09:34:00  1776.25  1777.00  1775.50  1776.25    9566            2968       5098       4468
12706  2014-04-17 09:30:00  1781.50  1782.50  1781.25  1782.25   16474            4583       7510       8964
12707  2014-04-17 09:31:00  1782.25  1782.50  1781.00  1781.25   10328            2587       6310       4018
12708  2014-04-17 09:32:00  1781.25  1782.25  1781.00  1781.25    9072            2142       4618       4454
12709  2014-04-17 09:33:00  1781.00  1781.75  1780.25  1781.25   17866            3807      10665       7201
12710  2014-04-17 09:34:00  1781.50  1782.25  1780.50  1781.75   11322            2523       5538       5784
38454  2014-07-18 09:30:00  1893.50  1893.75  1892.50  1893.00   24864            5135      13874      10990
38455  2014-07-18 09:31:00  1892.75  1893.50  1892.75  1892.75    8003            1751       3571       4432
38456  2014-07-18 09:32:00  1893.00  1893.50  1892.75  1893.50    7062            1680       3454       3608
38457  2014-07-18 09:33:00  1893.25  1894.25  1893.00  1894.25   10581            1955       3925       6656
38458  2014-07-18 09:34:00  1894.25  1895.25  1894.00  1895.25   15309            3347       5516       9793
42099  2014-07-31 09:30:00  1886.25  1886.25  1884.25  1884.75   21668            5857      11910       9758
42100  2014-07-31 09:31:00  1884.50  1884.75  1882.25  1883.00   17487            5186      11403       6084
42101  2014-07-31 09:32:00  1883.00  1884.50  1882.50  1884.00   13174            3782       4791       8383
42102  2014-07-31 09:33:00  1884.25  1884.50  1883.00  1883.25    9095            2814       5299       3796
42103  2014-07-31 09:34:00  1883.25  1884.25  1883.00  1884.25    7593            2528       3794       3799
...                    ...      ...      ...      ...      ...     ...             ...        ...        ...
193508 2016-01-21 09:30:00  1838.00  1838.75  1833.00  1834.00   22299            9699      12666       9633
193509 2016-01-21 09:31:00  1834.00  1836.50  1833.00  1834.50    8851            4520       4010       4841
193510 2016-01-21 09:32:00  1834.25  1835.25  1832.50  1833.25    7957            3672       3582       4375
193511 2016-01-21 09:33:00  1833.00  1838.50  1832.00  1838.00   12902            5564       5174       7728
193512 2016-01-21 09:34:00  1838.00  1841.50  1837.75  1840.50   13991            6130       6799       7192
199178 2016-02-10 09:30:00  1840.00  1841.75  1839.00  1840.75   13683            5080       6743       6940
199179 2016-02-10 09:31:00  1840.75  1842.00  1838.75  1841.50   11753            4623       5616       6137
199180 2016-02-10 09:32:00  1841.50  1844.75  1840.75  1843.00   16402            6818       8226       8176
199181 2016-02-10 09:33:00  1843.00  1843.50  1841.00  1842.00   14963            5402       8431       6532
199182 2016-02-10 09:34:00  1842.25  1843.50  1840.00  1840.00    8397            3475       4537       3860
200603 2016-02-16 09:30:00  1864.00  1866.25  1863.50  1864.75   19585            6865       9548      10037
200604 2016-02-16 09:31:00  1865.00  1865.50  1863.75  1864.25   16604            5936       8095       8509
200605 2016-02-16 09:32:00  1864.25  1864.75  1862.75  1863.50   10126            4713       5591       4535
200606 2016-02-16 09:33:00  1863.25  1863.75  1861.50  1862.25    9648            3786       5824       3824
200607 2016-02-16 09:34:00  1862.25  1863.50  1861.75  1862.25   10748            4143       5413       5335
205058 2016-03-02 09:30:00  1952.75  1954.25  1952.00  1952.75   19812            6684      10350       9462
205059 2016-03-02 09:31:00  1952.75  1954.50  1952.25  1953.50   10163            4236       3884       6279
205060 2016-03-02 09:32:00  1953.50  1954.75  1952.25  1952.50   15771            5519       8135       7636
205061 2016-03-02 09:33:00  1952.75  1954.50  1952.50  1953.75    9556            3583       3768       5788
205062 2016-03-02 09:34:00  1953.75  1954.75  1952.25  1952.50   11898            4463       6459       5439
209918 2016-03-18 09:30:00  2027.50  2028.25  2026.50  2028.00   38092            8644      17434      20658
209919 2016-03-18 09:31:00  2028.00  2028.25  2026.75  2027.25   11631            3209       6384       5247
209920 2016-03-18 09:32:00  2027.25  2027.75  2027.00  2027.50    9664            3270       5080       4584
209921 2016-03-18 09:33:00  2027.50  2027.75  2026.75  2026.75   10610            3117       5358       5252
209922 2016-03-18 09:34:00  2026.75  2027.00  2026.00  2026.50    8076            3022       4670       3406
227722 2016-05-20 09:30:00  2034.25  2035.25  2033.50  2034.50   30272            7815      16098      14174
227723 2016-05-20 09:31:00  2034.75  2035.75  2034.50  2035.50   12997            3690       6458       6539
227724 2016-05-20 09:32:00  2035.50  2037.50  2035.50  2037.25   12661            3864       5233       7428
227725 2016-05-20 09:33:00  2037.25  2037.75  2036.50  2037.00    9057            2524       5190       3867
227726 2016-05-20 09:34:00  2037.00  2037.50  2036.75  2037.00    5190            1620       2748       2442

[255 rows x 9 columns]

Upvotes: 1

Related Questions