user1913171
user1913171

Reputation: 283

time series data indexing using pandas or numpy

The below is my OHLC 1 minute data.

2011-11-01,9:00:00,248.50,248.95,248.20,248.70
2011-11-01,9:01:00,248.70,249.00,248.65,248.85
2011-11-01,9:02:00,248.90,249.25,248.70,249.15
...
2011-11-01,15:03:00,250.25,250.30,250.05,250.15
2011-11-01,15:04:00,250.15,250.60,250.10,250.60
2011-11-01,15:15:00,250.55,250.55,250.55,250.55
2011-11-02,9:00:00,245.55,246.25,245.40,245.80
2011-11-02,9:01:00,245.85,246.40,245.75,246.35
2011-11-02,9:02:00,246.30,246.45,245.75,245.80
2011-11-02,9:03:00,245.75,245.85,245.30,245.35
...

I loaded data and here is data:

                          2       3       4       5
0_1                                                                    
2011-11-01 09:00:00  248.50  248.95  248.20  248.70
2011-11-01 09:01:00  248.70  249.00  248.65  248.85
2011-11-01 09:02:00  248.90  249.25  248.70  249.15
2011-11-01 09:03:00  249.20  249.60  249.10  249.60
2011-11-01 09:04:00  249.55  249.95  249.50  249.60

I'd like to add 4 columns like the following in order to use groupby:

                          2       3       4       5    year month day time
0_1                                                                    
2011-11-01 09:00:00  248.50  248.95  248.20  248.70       0      0  0    0
2011-11-01 09:01:00  248.70  249.00  248.65  248.85       0      0  0    1
2011-11-01 09:02:00  248.90  249.25  248.70  249.15       0      0  0    2
2011-11-01 09:03:00  249.20  249.60  249.10  249.60       0      0  0    3
2011-11-01 09:04:00  249.55  249.95  249.50  249.60       0      0  0    4
....
2011-11-02 09:00:00  248.50  248.95  248.20  248.70       0      0  1    0
2011-11-02 09:01:00  248.70  249.00  248.65  248.85       0      0  1    1
2011-11-02 09:02:00  248.90  249.25  248.70  249.15       0      0  1    2
2011-11-02 09:03:00  249.20  249.60  249.10  249.60       0      0  1    3
2011-11-02 09:04:00  249.55  249.95  249.50  249.60       0      0  1    4

How can I add such kind of index columns ?

Thank you in advance.

Upvotes: 1

Views: 247

Answers (1)

Viktor Kerkez
Viktor Kerkez

Reputation: 46566

You can do it using the relativedelta function from the dateutil library.

from dateutil.relativedelta import relativedelta
start = df.index[0]
def func(item):
    delta = relativedelta(item, start)
    return (delta.years, delta.months, delta.days)

>>>> pd.DataFrame(list(df.index.map(func)),
                  index=df.index, columns=['year', 'month', 'day'])

                     year  month  day
0_1                                  
2011-11-01 09:00:00     0      0    0
2011-11-01 09:01:00     0      0    0
2011-11-01 09:02:00     0      0    0
2011-11-01 15:03:00     0      0    0
2011-11-01 15:04:00     0      0    0
2011-11-01 15:15:00     0      0    0
2011-11-02 09:00:00     0      0    1
2011-11-02 09:01:00     0      0    1
2011-11-02 09:02:00     0      0    1
2011-11-02 09:03:00     0      0    1

After this you can merge this with your DataFrame on index.

I don't know what the time column represents though? The minutes?

Upvotes: 3

Related Questions