TheIvanTheory
TheIvanTheory

Reputation: 33

Python plot with 24 hrs x and y axis using only hours and minutes from timestamp

I'm looking to create a plot with 24 hours in both the x and y axis, and make a scatterplot of start and end times. I have a CSV with the id and timestamps (including year, month, day and time) of the events, but need only where in the day it starts and ends regardless of the date. This is an example of the data.

ID        Start date        End date
431032   8/29/2014 15:33    8/29/2014 16:00
383548   7/28/2014 17:35    7/28/2014 17:45
257887   4/22/2014 19:19    4/22/2014 19:28

In other words, I need to make 'coordinates' out of hours and minutes to compare cluster data. I've never used time in both axes and haven't found an example of it. I'd really appreciate if someone that has already done this could share some tips.

Upvotes: 3

Views: 7366

Answers (1)

tdy
tdy

Reputation: 41327

Update: ax.plot_date is now discouraged:

plot_date exists for historic reasons and will be deprecated in the future, so datetime-like data should now directly be plotted using a standard plot function.

New example with df.plot.scatter or ax.scatter:

df = pd.DataFrame({'ID': [431032, 383548, 257887, 257887, 257887, 257887, 257887, 257887], 'Start': ['8/29/2014 15:33', '7/28/2014 17:35', '4/22/2014 19:19', '5/22/2014 09:19', '4/30/2014 03:19', '1/11/2014 12:19', '9/12/2014 09:19', '8/13/2014 06:19'], 'End': ['8/29/2014 16:00', '7/28/2014 17:45', '4/22/2014 19:28', '5/22/2014 23:28', '4/30/2014 09:28', '1/11/2014 23:28', '9/12/2014 14:28', '8/13/2014 08:28']})

#        ID            Start              End
# 0  431032  8/29/2014 15:33  8/29/2014 16:00
# 1  383548  7/28/2014 17:35  7/28/2014 17:45
# 2  257887  4/22/2014 19:19  4/22/2014 19:28
# ...
# 7  257887  8/13/2014 06:19  8/13/2014 08:28
  1. Convert only the time portion to_datetime to treat them all as one 24-hr period:

    df['Start'] = pd.to_datetime(df['Start'].str.split().str[-1]) # split on space (into date and time portions)
    df['End'] = pd.to_datetime(df['End'].str.split().str[-1]) # get last split element (time portion)
    

    Note: If your date columns were already a proper datetime, just use .dt.time:

    # only if your date columns are already dtype datetime64[ns]
    df['Start'] = pd.to_datetime(df['Start'].dt.time.astype(str))
    df['End'] = pd.to_datetime(df['End'].dt.time.astype(str))
    
  2. Plot via df.plot.scatter and reformat the ticks to HH:MM:

    ax = df.plot.scatter(x='Start', y='End')
    
    from matplotlib.dates import DateFormatter
    hh_mm = DateFormatter('%H:%M')
    ax.xaxis.set_major_formatter(hh_mm)
    ax.yaxis.set_major_formatter(hh_mm)
    


Full code:

import pandas as pd
from matplotlib.dates import DateFormatter

df = pd.DataFrame({
    'ID': [431032, 383548, 257887, 257887, 257887, 257887, 257887, 257887],
    'Start': ['8/29/2014 15:33', '7/28/2014 17:35', '4/22/2014 19:19', '5/22/2014 09:19', '4/30/2014 03:19', '1/11/2014 12:19', '9/12/2014 09:19', '8/13/2014 06:19'],
    'End': ['8/29/2014 16:00', '7/28/2014 17:45', '4/22/2014 19:28', '5/22/2014 23:28', '4/30/2014 09:28', '1/11/2014 23:28', '9/12/2014 14:28', '8/13/2014 08:28'],
})

# convert time portion to datetime
df['Start'] = pd.to_datetime(df['Start'].str.split().str[-1])
df['End'] = pd.to_datetime(df['End'].str.split().str[-1])

# plot end times vs start times
ax = df.plot.scatter(x='Start', y='End')

# reformat ticks as HH:MM
hh_mm = DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(hh_mm)
ax.yaxis.set_major_formatter(hh_mm)

Alternatives:

  • ax.scatter

    fig, ax = plt.subplots()
    ax.scatter(df['Start'], df['End'])
    ax.xaxis.set_major_formatter(hh_mm)
    ax.yaxis.set_major_formatter(hh_mm)
    
  • ax.plot_date (will be deprecated)

    fig, ax = plt.subplots()
    ax.plot_date(df['Start'], df['End'], ydate=True)
    ax.xaxis.set_major_formatter(hh_mm)
    ax.yaxis.set_major_formatter(hh_mm)
    

Upvotes: 6

Related Questions