Reputation: 273
I am trying to plot a Series (a columns from a dataframe to be precise). It seems to have valid data in the format hh:mm:ss (timedelta64)
In [14]: x5.task_a.describe()
Out[14]:
count 165
mean 0 days 03:35:41.121212
std 0 days 07:07:40.950819
min 0 days 00:00:06
25% 0 days 00:37:13
50% 0 days 01:28:17
75% 0 days 03:41:32
max 2 days 12:32:26
Name: task_a, dtype: object
In [15]: x5.task_a.head()
Out[15]:
wbdqueue_id
26868 00:26:11
26869 02:08:28
26872 00:26:07
26874 00:48:22
26875 00:26:17
Name: task_a, dtype: timedelta64[ns]
But when I try to plot it, I get an error saying there is no numeric data in the Empty 'DataFrame'. I've tried: x5.task_a.plot.kde() and x5.plot() where x5 is the DataFrame with several Series of such timedelta data.
TypeError: Empty 'DataFrame': no numeric data to plot
I see that one can generate series of random values and plot it.
What am I doing wrong?
Upvotes: 3
Views: 6194
Reputation: 77027
Convert to any logical numeric values, like hours or minutes, and then use .plot.kde()
(x5.task_a / np.timedelta64(1, 'h')).plot.kde()
Details
In [149]: x5
Out[149]:
task_a
0 0 days 22:27:46.684800
1 1 days 00:20:43.036800
2 0 days 12:16:24.873600
3 1 days 11:10:14.880000
4 1 days 03:31:05.548800
5 1 days 05:20:52.944000
6 1 days 00:09:09.590400
7 0 days 13:53:50.179200
8 1 days 04:08:57.695999
9 0 days 14:14:53.088000
In [150]: x5.task_a / np.timedelta64(1, 'h') # convert to hours
Out[150]:
0 22.462968
1 24.345288
2 12.273576
3 35.170800
4 27.518208
5 29.348040
6 24.152664
7 13.897272
8 28.149360
9 14.248080
Name: task_a, dtype: float64
Or to minutes
In [151]: x5.task_a / np.timedelta64(1, 'm')
Out[151]:
0 1347.77808
1 1460.71728
2 736.41456
3 2110.24800
4 1651.09248
5 1760.88240
6 1449.15984
7 833.83632
8 1688.96160
9 854.88480
Name: task_a, dtype: float64
another way using total_seconds
In [153]: x5.task_a.dt.total_seconds() / 60
Out[153]:
0 1347.77808
1 1460.71728
2 736.41456
3 2110.24800
4 1651.09248
5 1760.88240
6 1449.15984
7 833.83632
8 1688.96160
9 854.88480
Name: task_a, dtype: float64
Upvotes: 9
Reputation: 934
You can convert the TimedeltaIndex to total_seconds
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
idx = pd.date_range('20140101', '20140201')
df = pd.DataFrame(index=idx)
df['col0'] = np.random.randn(len(idx))
diff_idx = (pd.Series(((idx-
idx.shift(1)).fillna(pd.Timedelta(0))).map(pd.TimedeltaIndex.total_seconds),
index=idx)) # need to do this because we can't shift index
df['diff_dt'] = diff_idx
df['diff_dt'].plot()
Upvotes: 1