Tom
Tom

Reputation: 8790

Make a new index after grouping by hour and minute (or concatenating 2 levels of a multiindex)

Following this discussion, you can group date by the time of day rather than the datetime itself:

import pandas as pd
import numpy as np

dr = pd.date_range('01-01-2020', '01-03-2020', freq='30T')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
df = df.groupby([df.index.hour, df.index.minute]).sum()

This aggregates the data as expected, but returns a MultiIndex DataFrame, with unlabeled hours at one level and unlabeled minutes in another:

              0
0  0   1.383863
   30  1.293589
1  0   0.545617
   30  0.208717
2  0   0.464892
   30  0.392486
...

I want to create a new single index which represents the times either as strings or as actual datetime.time(). I can get the times into a string, doing something like the following:

hours = df.index.get_level_values(0).astype(str).str.pad(2, fillchar='0')
minutes = df.index.get_level_values(1).astype(str).str.pad(2, fillchar='0')
new = hours + ':' + minutes
df = df.set_index(new, drop=True)
df.index.name = 'time'

Output:

              0
time           
00:00  1.203189
00:30  1.943932
01:00  1.671113
01:30  1.640122
02:00  1.240969
02:30  0.687489
...

This works but is a very cumbersome set of commands. My questions are:

  1. Is there an easier way to do convert this MultiIndex to a combined string version?
  2. How could I convert the MultiIndex to datetime.time()? I know I could convert the original index with df.index = df.index.time, but I haven't found a way to take the hours from one level of index and minutes from another level of index to get time objects.

Upvotes: 0

Views: 248

Answers (1)

Chris
Chris

Reputation: 16147

To group into strings:

import pandas as pd
import numpy as np
    
dr = pd.date_range('01-01-2020', '01-03-2020', freq='30T')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
df = df.groupby(df.index.strftime('%H:%M')).sum()
df.index.name = 'time'

Or to group into datetime.time, replace the groupby line with:

df = df.groupby(df.index.strftime(df.index.time)).sum()

Upvotes: 1

Related Questions