Sachin Myneni
Sachin Myneni

Reputation: 273

How to plot timedelta data from a pandas DataFrame?

I am trying to plot a Series (a columns from a dataframe to be precise). It seems to have valid data in the format hh:mm:ss (timedelta64)

In [14]: x5.task_a.describe()
Out[14]: 
count                       165
mean     0 days 03:35:41.121212
std      0 days 07:07:40.950819
min             0 days 00:00:06
25%             0 days 00:37:13
50%             0 days 01:28:17
75%             0 days 03:41:32
max             2 days 12:32:26
Name: task_a, dtype: object

In [15]: x5.task_a.head()
Out[15]: 
wbdqueue_id
26868   00:26:11
26869   02:08:28
26872   00:26:07
26874   00:48:22
26875   00:26:17
Name: task_a, dtype: timedelta64[ns]

But when I try to plot it, I get an error saying there is no numeric data in the Empty 'DataFrame'. I've tried: x5.task_a.plot.kde() and x5.plot() where x5 is the DataFrame with several Series of such timedelta data.

TypeError: Empty 'DataFrame': no numeric data to plot

I see that one can generate series of random values and plot it.

What am I doing wrong?

Upvotes: 3

Views: 6194

Answers (2)

Zero
Zero

Reputation: 77027

Convert to any logical numeric values, like hours or minutes, and then use .plot.kde()

(x5.task_a / np.timedelta64(1, 'h')).plot.kde()

Details

In [149]: x5
Out[149]:
                  task_a
0 0 days 22:27:46.684800
1 1 days 00:20:43.036800
2 0 days 12:16:24.873600
3 1 days 11:10:14.880000
4 1 days 03:31:05.548800
5 1 days 05:20:52.944000
6 1 days 00:09:09.590400
7 0 days 13:53:50.179200
8 1 days 04:08:57.695999
9 0 days 14:14:53.088000

In [150]: x5.task_a / np.timedelta64(1, 'h')  # convert to hours
Out[150]:
0    22.462968
1    24.345288
2    12.273576
3    35.170800
4    27.518208
5    29.348040
6    24.152664
7    13.897272
8    28.149360
9    14.248080
Name: task_a, dtype: float64

Or to minutes

In [151]: x5.task_a / np.timedelta64(1, 'm')
Out[151]:
0    1347.77808
1    1460.71728
2     736.41456
3    2110.24800
4    1651.09248
5    1760.88240
6    1449.15984
7     833.83632
8    1688.96160
9     854.88480
Name: task_a, dtype: float64

another way using total_seconds

In [153]: x5.task_a.dt.total_seconds() / 60
Out[153]:
0    1347.77808
1    1460.71728
2     736.41456
3    2110.24800
4    1651.09248
5    1760.88240
6    1449.15984
7     833.83632
8    1688.96160
9     854.88480
Name: task_a, dtype: float64

Upvotes: 9

BA.
BA.

Reputation: 934

You can convert the TimedeltaIndex to total_seconds

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

idx = pd.date_range('20140101', '20140201')
df = pd.DataFrame(index=idx)
df['col0'] = np.random.randn(len(idx))
diff_idx = (pd.Series(((idx-
idx.shift(1)).fillna(pd.Timedelta(0))).map(pd.TimedeltaIndex.total_seconds), 
index=idx)) # need to do this because we can't shift index
df['diff_dt'] = diff_idx
df['diff_dt'].plot()

Upvotes: 1

Related Questions