Reputation: 33
I'm looking to create a plot with 24 hours in both the x and y axis, and make a scatterplot of start and end times. I have a CSV with the id and timestamps (including year, month, day and time) of the events, but need only where in the day it starts and ends regardless of the date. This is an example of the data.
ID Start date End date
431032 8/29/2014 15:33 8/29/2014 16:00
383548 7/28/2014 17:35 7/28/2014 17:45
257887 4/22/2014 19:19 4/22/2014 19:28
In other words, I need to make 'coordinates' out of hours and minutes to compare cluster data. I've never used time in both axes and haven't found an example of it. I'd really appreciate if someone that has already done this could share some tips.
Upvotes: 3
Views: 7366
Reputation: 41327
Update: ax.plot_date
is now discouraged:
plot_date
exists for historic reasons and will be deprecated in the future, sodatetime
-like data should now directly be plotted using a standard plot function.
New example with df.plot.scatter
or ax.scatter
:
df = pd.DataFrame({'ID': [431032, 383548, 257887, 257887, 257887, 257887, 257887, 257887], 'Start': ['8/29/2014 15:33', '7/28/2014 17:35', '4/22/2014 19:19', '5/22/2014 09:19', '4/30/2014 03:19', '1/11/2014 12:19', '9/12/2014 09:19', '8/13/2014 06:19'], 'End': ['8/29/2014 16:00', '7/28/2014 17:45', '4/22/2014 19:28', '5/22/2014 23:28', '4/30/2014 09:28', '1/11/2014 23:28', '9/12/2014 14:28', '8/13/2014 08:28']})
# ID Start End
# 0 431032 8/29/2014 15:33 8/29/2014 16:00
# 1 383548 7/28/2014 17:35 7/28/2014 17:45
# 2 257887 4/22/2014 19:19 4/22/2014 19:28
# ...
# 7 257887 8/13/2014 06:19 8/13/2014 08:28
Convert only the time portion to_datetime
to treat them all as one 24-hr period:
df['Start'] = pd.to_datetime(df['Start'].str.split().str[-1]) # split on space (into date and time portions)
df['End'] = pd.to_datetime(df['End'].str.split().str[-1]) # get last split element (time portion)
Note: If your date columns were already a proper datetime
, just use .dt.time
:
# only if your date columns are already dtype datetime64[ns]
df['Start'] = pd.to_datetime(df['Start'].dt.time.astype(str))
df['End'] = pd.to_datetime(df['End'].dt.time.astype(str))
Plot via df.plot.scatter
and reformat the ticks to HH:MM
:
ax = df.plot.scatter(x='Start', y='End')
from matplotlib.dates import DateFormatter
hh_mm = DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(hh_mm)
ax.yaxis.set_major_formatter(hh_mm)
Full code:
import pandas as pd
from matplotlib.dates import DateFormatter
df = pd.DataFrame({
'ID': [431032, 383548, 257887, 257887, 257887, 257887, 257887, 257887],
'Start': ['8/29/2014 15:33', '7/28/2014 17:35', '4/22/2014 19:19', '5/22/2014 09:19', '4/30/2014 03:19', '1/11/2014 12:19', '9/12/2014 09:19', '8/13/2014 06:19'],
'End': ['8/29/2014 16:00', '7/28/2014 17:45', '4/22/2014 19:28', '5/22/2014 23:28', '4/30/2014 09:28', '1/11/2014 23:28', '9/12/2014 14:28', '8/13/2014 08:28'],
})
# convert time portion to datetime
df['Start'] = pd.to_datetime(df['Start'].str.split().str[-1])
df['End'] = pd.to_datetime(df['End'].str.split().str[-1])
# plot end times vs start times
ax = df.plot.scatter(x='Start', y='End')
# reformat ticks as HH:MM
hh_mm = DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(hh_mm)
ax.yaxis.set_major_formatter(hh_mm)
Alternatives:
fig, ax = plt.subplots()
ax.scatter(df['Start'], df['End'])
ax.xaxis.set_major_formatter(hh_mm)
ax.yaxis.set_major_formatter(hh_mm)
ax.plot_date
(will be deprecated)
fig, ax = plt.subplots()
ax.plot_date(df['Start'], df['End'], ydate=True)
ax.xaxis.set_major_formatter(hh_mm)
ax.yaxis.set_major_formatter(hh_mm)
Upvotes: 6