Reputation: 360
Setup: I have a multiindex dataframe data like this;
value
date date
2015-08-13 00:00:00+10:00 2015-08-13 06:30:00+10:00 0.812689
2015-08-13 15:30:00+10:00 0.054290
2015-08-13 16:00:00+10:00 0.206277
2015-08-13 16:30:00+10:00 0.082520
2015-08-13 17:00:00+10:00 0.009448
2015-08-13 17:30:00+10:00 0.000000
2015-08-14 00:00:00+10:00 2015-08-14 06:30:00+10:00 0.000000
2015-08-14 07:00:00+10:00 0.000280
2015-08-14 07:30:00+10:00 0.034119
2015-08-14 08:00:00+10:00 0.168524
2015-08-14 08:30:00+10:00 0.471783
2015-08-14 09:00:00+10:00 0.522409
As an interim step I make the first index level to just be dates and the second index level to just be times, which I have done with,
# set index level 0 to dates
day_start=[i.date() for i in data.index.levels[0]]
data.index.set_levels(day_start, level=0, inplace=True)
# set index level 1 to times
interval_start=[i.time() for i in data.index.levels[1]]
data_interval.index.set_levels(interval_start, level=1, inplace=True)
# rename time index
data.index.set_names('time', level=1, inplace=True)
Maybe not the best way to do it but it gives,
value
date time
2015-08-13 06:30:00 0.812689
15:30:00 0.054290
16:00:00 0.206277
16:30:00 0.082520
17:00:00 0.009448
17:30:00 0.000000
2015-08-14 06:30:00 0.000000
07:00:00 0.000280
07:30:00 0.034119
08:00:00 0.168524
08:30:00 0.471783
09:00:00 0.522409
Problem: What I haven't been able to do next is reindex the time so there's an index every 30 minutes from 00:00 to 23:30, with zeros filled in for missing data. This would make it consistent for every day, which may have different start/end times with data. i.e.
value
date time
2015-08-13 00:00:00 0.0
00:30:00 0.0
:
06:30:00 0.812689
07:00:00 0.0
07:30:00 0.0
:
15:30:00 0.054290
16:00:00 0.206277
16:30:00 0.082520
:
23:30:00 0.0
And so on for each day. Trying to reindex on level=1 seems to have no effect when passing in an array of 30 minute spaced times. Not sure this is even the right approach.
Next step: What I'd like to do after that is data.unstack(level=1) so all the time indices become column headers. If I unstack it as is I get a weird mash up of columns with repeating times (which is mainly why I'm trying to make them consistent between days in the first place). Something like;
value
time 06:30:00 15:30:00 16:00:00 16:30:00 17:00:00 17:30:00 06:30:00
date
2015-08-13 0.812689 0.05429 0.206277 0.08252 0.009448 0.0 0.0
2015-08-14 0.000000 0.00000 0.000000 0.00000 0.000000 0.0 0.0
2015-08-15 0.000000 0.00000 0.000000 0.00000 0.000000 0.0 0.0
2015-08-16 0.000000 0.00000 0.000000 0.00000 0.000000 0.0 0.0
2015-08-17 0.000000 0.00000 0.000000 0.00000 0.000000 0.0 0.0
There's lots of missing data on those days so it didn't go into the correct columns I'm guessing. I'm probably fundamentally missing something in the reindexing and maybe my whole approach is not the way to get the end result.
Upvotes: 0
Views: 815
Reputation: 249123
First, just discard the "date" column. It is redundant and hurts more than it helps. That's df.index = df.index.droplevel(0)
.
Now you have this:
value
time
2015-08-13 06:30:00 0.812689
2015-08-13 15:30:00 0.054290
2015-08-13 16:00:00 0.206277
2015-08-13 16:30:00 0.082520
2015-08-13 17:00:00 0.009448
2015-08-13 17:30:00 0.000000
2015-08-14 06:30:00 0.000000
2015-08-14 07:00:00 0.000280
2015-08-14 07:30:00 0.034119
2015-08-14 08:00:00 0.168524
2015-08-14 08:30:00 0.471783
2015-08-14 09:00:00 0.522409
Then, df.resample('30min').first().fillna(0)
:
value
time
2015-08-13 06:30:00 0.812689
2015-08-13 07:00:00 0.000000
2015-08-13 07:30:00 0.000000
2015-08-13 08:00:00 0.000000
...
Now split the index into separate date and time parts:
df['date'] = df.index.date
df['time'] = df.index.time
And finally, pivot:
df.pivot(values='value', index='date', columns='time')
Upvotes: 2