Rafael S. Calsaverini
Rafael S. Calsaverini

Reputation: 14022

Map a pandas DataFrame index

Suppose I have a dataframe indexed by datetime:

> df.head()

                        value
2013-01-01 00:00:00 -0.014844
2013-01-01 01:00:00  0.243548
2013-01-01 02:00:00  0.463755
2013-01-01 03:00:00  0.695867
2013-01-01 04:00:00  0.845290
(...)

if I wanted to plot all values by date, I could do:

times = map(lambda x : x.date(), df.index)
values = df.value
plot(values, times)

Is there a more "pandas idiomatic" way to do it? I tried the .rename method, but I got a assertion error:

df.rename(lambda x : x.time())

What I really wanted was to do something like a boxplot:

df.boxplot(by = lambda x : x.time())

but without the standard deviation boxes (which will be substituted by estimated confidence bands). Is there a way to do this with a simple pandas command?


I don't know if I was clear about what was the problem. The problem is that I have a datetime field as index of the dataframe, and I need to extract only the time part and plot the values by time. This will give me lots of points with the same x-axis, which is fine, but the rename method seems to expect that each value in the resulting index is unique.

Upvotes: 3

Views: 5037

Answers (3)

Andy Hayden
Andy Hayden

Reputation: 375535

You can plot natively with the DataFrame plot method, for example:

df.plot()
df.plot(kind='bar')
...

This method gives you a lot of flexibility (with all the power of matplotlib).
The visualisation section of the docs goes into a lot of detail, and has plenty of examples.


In 0.12+ there's a time method/attribute on an DatetimeIndex (IIRC due to this question):

df.index.time  # equivalent to df.index.map(lambda ts: ts.time())

To plot only the times, you could use:

plot(df.index.time, df.value)

However this seems only slightly better than your solution, if at all. Perhaps timeseries index ought to offer a time method, similar to how it does for hour (I vaguely recall a similar question...):

plot(df.index.hour, df.value))

Upvotes: 1

Dale
Dale

Reputation: 4710

If you want the time values, then this is fairly fast.

def dt_time(ind):
  return np.array([time(*time_tuple) for time_tuple in zip(ind.hour, ind.minute, ind.second)])

Calling map will be magnitudes slower.

In [29]: %timeit dt_time(dt)
1000 loops, best of 3: 511 µs per loop

In [30]: %timeit dt_map(dt)
10 loops, best of 3: 96.3 ms per loop

for a 100 length DatetimeIndex.

Upvotes: 1

HYRY
HYRY

Reputation: 97301

Here is my solution:

crate the data:

import pandas as pd
from pandas import *
from numpy.random import randn
rng = date_range('1/1/2011', periods=72, freq='H')
ts = TimeSeries(randn(72), index=rng)

plot date-value:

ts.to_period("D").plot(style="o")

enter image description here

plot time-value:

TimeSeries(ts.values, index=DatetimeIndex(ts.index.values - 
    ts.index.to_period("D").to_timestamp().values)).plot(style="o")

enter image description here

Upvotes: 1

Related Questions