Small Chimp
Small Chimp

Reputation: 63

Making a timeline graph with a dataframe with grouped values needing a for loop

If I have the dataframe:

values               start time   end time
Ed, Taylor, Liv       0:00:00      0:00:15 
Ed, Liv, Peter        0:00:15      0:00:30
Taylor, Liv, Peter    0:00:30      0:00:49
Ed, Liv, Peter        0:00:49      0:01:02

How could I iterate over values and create a timeline (most likely in matplotlib, maybe plt.broken_barh() ) that plots the segments of time that they are within the column "values?" For example, the X axis would span 0:00:00 to 0:01:02 (min and max values present) and the bar for Ed would go from 0:00:00 to 0:00:15, 0:00:15 to 0:30, be absent from 0:00:30 to 0:00:49, and come back up from 0:00:49 to 0:01:02. After iterating through Ed, it would do Taylor, Liv, and then Peter (the values that would be contained in values.unique() ) to finish with a graph with 4 bars with missing segments where there is not a time series value for the element of "values"

I'm fairly unfamiliar with time series data, especially when the value I'm looking to plot is just the presence of a string within a column as opposed to a value like money or temperature. Basically all I'm looking for is whether the value is present on a timeline or not.

Upvotes: 1

Views: 1025

Answers (1)

JohanC
JohanC

Reputation: 80509

The way the dataframe is set up is not so straightforward to use. As all the names are put together in a compound string, they need to be separated to be useable.

The timestamps can be converted to pandas timestamps using pd.to_datatime.

Here is a way to display the data. Many other approaches are possible, such as creating a column for each person with a boolean to tell whether they are included in the values column.

from matplotlib import pyplot as plt
import pandas as pd
from datetime import datetime
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

df = pd.DataFrame([['Ed, Taylor, Liv', '0:00:00', '0:00:15'],
                   ['Ed, Liv, Peter', '0:00:15', '0:00:30'],
                   ['Taylor, Liv, Peter', '0:00:30', '0:00:49'],
                   ['Ed, Liv, Peter', '0:00:49', '0:01:02']],
                  columns=['values', 'start time', 'end time'])
df['start time'] = pd.to_datetime(df['start time'])
df['end time'] = pd.to_datetime(df['end time'])

persons_set = set(name.strip() for names in df['values'] for name in names.split(","))
persons = {p: i for i, p in enumerate(sorted(persons_set))}
print(persons)
for person in persons:
    periods = []
    for names, start, end in zip(df['values'], df['start time'], df['end time']):
        if person in set(name.strip() for name in names.split(",")):
            periods.append((start, end - start))
    plt.broken_barh(periods, (persons[person] - 0.45, 0.9),
                    facecolors=plt.cm.plasma(persons[person] / len(persons)))

plt.yticks(range(len(persons)), persons)
plt.show()

resulting plot

Upvotes: 2

Related Questions