Reputation: 12529
I have a very awkward dataframe that looks like this:
+----+------+-------+-------+--------+----+--------+
| | | hour1 | hour2 | hour 3 | … | hour24 |
+----+------+-------+-------+--------+----+--------+
| id | date | | | | | |
| 1 | 3 | 4 | 0 | 96 | 88 | 35 |
| | 4 | 10 | 2 | 54 | 42 | 37 |
| | 5 | 9 | 32 | 8 | 70 | 34 |
| | 6 | 36 | 89 | 69 | 46 | 78 |
| 2 | 5 | 17 | 41 | 48 | 45 | 71 |
| | 6 | 50 | 66 | 82 | 72 | 59 |
| | 7 | 14 | 24 | 55 | 20 | 89 |
| | 8 | 76 | 36 | 13 | 14 | 21 |
| 3 | 5 | 97 | 19 | 41 | 61 | 72 |
| | 6 | 22 | 4 | 56 | 82 | 15 |
| | 7 | 17 | 57 | 30 | 63 | 88 |
| | 8 | 83 | 43 | 35 | 8 | 4 |
+----+------+-------+-------+--------+----+--------+
For each id
there is a list of dates
and for each date
the hour columns represent that full day's worth of data broken out by hour for the full 24hrs.
What I would like to do is plot (using matplotlib) the full hourly data for each of the ids
, but I can't think of a way to do this. I was looking into the possibility of creating numpy matrices, but I'm not sure if that is the right path to go down.
Clarification: Essentially, for each id I want to concatenate all the hourly data together in order and plot that. I already have the days in the proper order, so I imagine it's just a matter finding a way to put all of the hourly data for each id into one object
Any thoughts on how to best accomplish this?
Here is some sample data in csv format: http://www.sharecsv.com/s/e56364930ddb3d04dec6994904b05cc6/test1.csv
Upvotes: 0
Views: 4208
Reputation: 5418
I am not totally happy with this solution but maybe it can serve as starting point. Since your data is cyclic, I chose a polar chart. Unfortunately, the resolution in the y direction is poor. Therefore, I zoomed manually into the plot:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.read_csv('test1.csv')
df_new = df.set_index(['id','date'])
n = len(df_new.columns)
# convert from hours to rad
angle = np.linspace(0,2*np.pi,n)
# color palete to cycle through
n_data = len(df_new.T.columns)
color = plt.cm.Paired(np.linspace(0,1,n_data/2)) # divided by two since you have 'red', and 'blue'
from itertools import cycle
c_iter = cycle(color)
fig = plt.figure()
ax = fig.add_subplot(111, polar=True)
# looping through the columns and manually select one category
for ind, i in enumerate(df_new.T.columns):
if i[0] == 'red':
ax.plot(angle,df_new.T[i].values,color=c_iter.next(),label=i,linewidth=2)
# set the labels
ax.set_xticks(np.linspace(0, 2*np.pi, 24, endpoint=False))
ax.set_xticklabels(range(24))
# make the legend
ax.legend(loc='upper left', bbox_to_anchor = (1.2,1.1))
plt.show()
Zoom 0:
Zoom 1:
Zoom 2:
Upvotes: 2
Reputation: 8906
It might also be of interest to stack the data frame so that you have the dates and times together in the same index. For example, doing
df = df.stack().unstack(0)
Will put the dates and times in the index and the id as the columns names. Calling df.plot()
will give you a line plot for each time series on the same axes. So you could do it as
ax = df.stack().unstack(0).plot()
and format the axes either by passing arguments to the plot
method or by calling methods on ax
.
Upvotes: 2
Reputation: 251468
Here is one approach:
for groupID, data in d.groupby(level='id'):
fig = pyplot.figure()
ax = fig.gca()
ax.plot(data.values.ravel())
ax.set_xticks(np.arange(len(data))*24)
ax.set_xticklabels(data.index.get_level_values('date'))
ravel
is a numpy method that will string out multiple rows into one long 1D array.
Beware running this interactively on a large dataset, as it creates a separate plot for each line. If you want to save the plots or the like, set a noninteractive matplotlib backend and use savefig
to save each figure, then close it before creating the next one.
Upvotes: 2