dailyglen
dailyglen

Reputation: 715

Pandas: how to plot yearly data on top of each other

I have a series of data indexed by time values (a float) and I want to take chunks of the series and plot them on top of each other. So for example, lets say I have stock prices taken about every 10 minutes for a period of 20 weeks and I want to see the weekly pattern by plotting 20 lines of the stock prices. So my X axis is one week and I have 20 lines (corresponding to the prices during the week).

Updated

The index is not a uniformly spaced value and it is a floating point. It is something like:

t = np.arange(0,12e-9,12e-9/1000.0)
noise = np.random.randn(1000)/1e12
cn = noise.cumsum()
t_noise = t+cn
y = sin(2*math.pi*36e7*t_noise) + noise
df = DataFrame(y,index=t_noise,columns=["A"])
df.plot(marker='.')
plt.axis([0,0.2e-8,0,1])

So the index is not uniformly spaced. I'm dealing with voltage vs time data from a simulator. I would like to know how to create a window of time, T, and split df into chunks of T long and plot them on top of each other. So if the data was 20*T long then I would have 20 lines in the same plot.

Sorry for the confusion; I used the stock analogy thinking it might help.

Upvotes: 1

Views: 3886

Answers (2)

Garrett
Garrett

Reputation: 49788

Assuming a pandas.TimeSeries object as the starting point, you can group elements by ISO week number and ISO weekday with datetime.date.isocalendar(). The following statement, which ignores ISO year, aggregates the last sample of each day.

In [95]: daily = ts.groupby(lambda x: x.isocalendar()[1:]).agg(lambda s: s[-1])

In [96]: daily
Out[96]: 
key_0
(1, 1)     63
(1, 2)     91
(1, 3)     73
...
(20, 5)    82
(20, 6)    53
(20, 7)    63
Length: 140

There may be cleaner way to perform the next step, but the goal is to change the index from an array of tuples to a MultiIndex object.

In [97]: daily.index = pandas.MultiIndex.from_tuples(daily.index, names=['W', 'D'])

In [98]: daily
Out[98]: 
W   D
1   1    63
    2    91
    3    73
    4    88
    5    84
    6    95
    7    72
...
20  1    81
    2    53
    3    78
    4    64
    5    82
    6    53
    7    63
Length: 140

The final step is to "unstack" weekday from the MultiIndex, creating columns for each weekday, and replace the weekday numbers with an abbreviation, to improve readability.

In [102]: dofw = "Mon Tue Wed Thu Fri Sat Sun".split()

In [103]: grid = daily.unstack('D').rename(columns=lambda x: dofw[x-1])

In [104]: grid
Out[104]: 
    Mon  Tue  Wed  Thu  Fri  Sat  Sun
W                                    
1    63   91   73   88   84   95   72
2    66   77   96   72   56   80   66
...
19   56   69   89   69   96   73   80
20   81   53   78   64   82   53   63

To create a line plot for each week, transpose the dataframe, so the columns are week numbers and rows are weekdays (note this step can be avoided by unstacking week number, in place of weekday, in the previous step), and call plot.

grid.T.plot()

Upvotes: 4

archlight
archlight

Reputation: 687

let me try to answer this. basically i will pad or reindex with complete weekdays and sample every 5 days while drop missing data due to holiday or suspension

>>> coke = DataReader('KO', 'yahoo', start=datetime(2012,1,1))

>>> startd=coke.index[0]-timedelta(coke.index[0].isoweekday()-1)

>>> rng = array(DateRange(str(startd), periods=90))

>>> chunk=[]

>>> for i in range(18):

... chunk.append(coke[i*5:(i+1)*5].dropna())

...

then you can loop chunk to plot each week data

Upvotes: 0

Related Questions