metersk
metersk

Reputation: 12529

Plotting an awkward pandas multi index dataframe

I have a very awkward dataframe that looks like this:

+----+------+-------+-------+--------+----+--------+
|    |      | hour1 | hour2 | hour 3 | …  | hour24 |
+----+------+-------+-------+--------+----+--------+
| id | date |       |       |        |    |        |
| 1  | 3    |     4 |     0 |     96 | 88 |     35 |
|    | 4    |    10 |     2 |     54 | 42 |     37 |
|    | 5    |     9 |    32 |      8 | 70 |     34 |
|    | 6    |    36 |    89 |     69 | 46 |     78 |
| 2  | 5    |    17 |    41 |     48 | 45 |     71 |
|    | 6    |    50 |    66 |     82 | 72 |     59 |
|    | 7    |    14 |    24 |     55 | 20 |     89 |
|    | 8    |    76 |    36 |     13 | 14 |     21 |
| 3  | 5    |    97 |    19 |     41 | 61 |     72 |
|    | 6    |    22 |     4 |     56 | 82 |     15 |
|    | 7    |    17 |    57 |     30 | 63 |     88 |
|    | 8    |    83 |    43 |     35 |  8 |      4 |
+----+------+-------+-------+--------+----+--------+

For each id there is a list of dates and for each date the hour columns represent that full day's worth of data broken out by hour for the full 24hrs.

What I would like to do is plot (using matplotlib) the full hourly data for each of the ids, but I can't think of a way to do this. I was looking into the possibility of creating numpy matrices, but I'm not sure if that is the right path to go down.

Clarification: Essentially, for each id I want to concatenate all the hourly data together in order and plot that. I already have the days in the proper order, so I imagine it's just a matter finding a way to put all of the hourly data for each id into one object

Any thoughts on how to best accomplish this?

Here is some sample data in csv format: http://www.sharecsv.com/s/e56364930ddb3d04dec6994904b05cc6/test1.csv

Upvotes: 0

Views: 4208

Answers (3)

Moritz
Moritz

Reputation: 5418

I am not totally happy with this solution but maybe it can serve as starting point. Since your data is cyclic, I chose a polar chart. Unfortunately, the resolution in the y direction is poor. Therefore, I zoomed manually into the plot:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

df = pd.read_csv('test1.csv')
df_new = df.set_index(['id','date'])
n = len(df_new.columns)

# convert from hours to rad
angle = np.linspace(0,2*np.pi,n)


# color palete to cycle through
n_data = len(df_new.T.columns)
color = plt.cm.Paired(np.linspace(0,1,n_data/2)) # divided by two since you have 'red', and 'blue'
from itertools import cycle
c_iter = cycle(color)

fig = plt.figure()
ax = fig.add_subplot(111, polar=True)

# looping through the columns and manually select one category
for ind, i in enumerate(df_new.T.columns):
    if i[0] == 'red':
        ax.plot(angle,df_new.T[i].values,color=c_iter.next(),label=i,linewidth=2)


# set the labels
ax.set_xticks(np.linspace(0, 2*np.pi, 24, endpoint=False))
ax.set_xticklabels(range(24))

# make the legend
ax.legend(loc='upper left', bbox_to_anchor = (1.2,1.1))
plt.show()

Zoom 0:

enter image description here

Zoom 1: enter image description here

Zoom 2: enter image description here

Upvotes: 2

JoeCondron
JoeCondron

Reputation: 8906

It might also be of interest to stack the data frame so that you have the dates and times together in the same index. For example, doing

df = df.stack().unstack(0) 

Will put the dates and times in the index and the id as the columns names. Calling df.plot() will give you a line plot for each time series on the same axes. So you could do it as

ax = df.stack().unstack(0).plot()

and format the axes either by passing arguments to the plot method or by calling methods on ax.

Upvotes: 2

BrenBarn
BrenBarn

Reputation: 251468

Here is one approach:

for groupID, data in d.groupby(level='id'):
    fig = pyplot.figure()
    ax = fig.gca()
    ax.plot(data.values.ravel())
    ax.set_xticks(np.arange(len(data))*24)
    ax.set_xticklabels(data.index.get_level_values('date'))

ravel is a numpy method that will string out multiple rows into one long 1D array.

Beware running this interactively on a large dataset, as it creates a separate plot for each line. If you want to save the plots or the like, set a noninteractive matplotlib backend and use savefig to save each figure, then close it before creating the next one.

Upvotes: 2

Related Questions