Reputation: 14022
Suppose I have a dataframe indexed by datetime
:
> df.head()
value
2013-01-01 00:00:00 -0.014844
2013-01-01 01:00:00 0.243548
2013-01-01 02:00:00 0.463755
2013-01-01 03:00:00 0.695867
2013-01-01 04:00:00 0.845290
(...)
if I wanted to plot all values by date, I could do:
times = map(lambda x : x.date(), df.index)
values = df.value
plot(values, times)
Is there a more "pandas idiomatic" way to do it? I tried the .rename
method, but I got a assertion error:
df.rename(lambda x : x.time())
What I really wanted was to do something like a boxplot:
df.boxplot(by = lambda x : x.time())
but without the standard deviation boxes (which will be substituted by estimated confidence bands). Is there a way to do this with a simple pandas command?
I don't know if I was clear about what was the problem. The problem is that I have a datetime field as index of the dataframe, and I need to extract only the time part and plot the values by time. This will give me lots of points with the same x-axis, which is fine, but the rename
method seems to expect that each value in the resulting index is unique.
Upvotes: 3
Views: 5037
Reputation: 375535
You can plot natively with the DataFrame plot
method, for example:
df.plot()
df.plot(kind='bar')
...
This method gives you a lot of flexibility (with all the power of matplotlib).
The visualisation section of the docs goes into a lot of detail, and has plenty of examples.
In 0.12+ there's a time method/attribute on an DatetimeIndex (IIRC due to this question):
df.index.time # equivalent to df.index.map(lambda ts: ts.time())
To plot only the times, you could use:
plot(df.index.time, df.value)
However this seems only slightly better than your solution, if at all. Perhaps timeseries index ought to offer a time method, similar to how it does for hour (I vaguely recall a similar question...):
plot(df.index.hour, df.value))
Upvotes: 1
Reputation: 4710
If you want the time values, then this is fairly fast.
def dt_time(ind):
return np.array([time(*time_tuple) for time_tuple in zip(ind.hour, ind.minute, ind.second)])
Calling map
will be magnitudes slower.
In [29]: %timeit dt_time(dt)
1000 loops, best of 3: 511 µs per loop
In [30]: %timeit dt_map(dt)
10 loops, best of 3: 96.3 ms per loop
for a 100 length DatetimeIndex.
Upvotes: 1
Reputation: 97301
Here is my solution:
crate the data:
import pandas as pd
from pandas import *
from numpy.random import randn
rng = date_range('1/1/2011', periods=72, freq='H')
ts = TimeSeries(randn(72), index=rng)
plot date-value:
ts.to_period("D").plot(style="o")
plot time-value:
TimeSeries(ts.values, index=DatetimeIndex(ts.index.values -
ts.index.to_period("D").to_timestamp().values)).plot(style="o")
Upvotes: 1