Reputation: 63
If I have the dataframe:
values start time end time
Ed, Taylor, Liv 0:00:00 0:00:15
Ed, Liv, Peter 0:00:15 0:00:30
Taylor, Liv, Peter 0:00:30 0:00:49
Ed, Liv, Peter 0:00:49 0:01:02
How could I iterate over values and create a timeline (most likely in matplotlib, maybe plt.broken_barh() ) that plots the segments of time that they are within the column "values?" For example, the X axis would span 0:00:00 to 0:01:02 (min and max values present) and the bar for Ed would go from 0:00:00 to 0:00:15, 0:00:15 to 0:30, be absent from 0:00:30 to 0:00:49, and come back up from 0:00:49 to 0:01:02. After iterating through Ed, it would do Taylor, Liv, and then Peter (the values that would be contained in values.unique() ) to finish with a graph with 4 bars with missing segments where there is not a time series value for the element of "values"
I'm fairly unfamiliar with time series data, especially when the value I'm looking to plot is just the presence of a string within a column as opposed to a value like money or temperature. Basically all I'm looking for is whether the value is present on a timeline or not.
Upvotes: 1
Views: 1025
Reputation: 80509
The way the dataframe is set up is not so straightforward to use. As all the names are put together in a compound string, they need to be separated to be useable.
The timestamps can be converted to pandas timestamps using pd.to_datatime
.
Here is a way to display the data. Many other approaches are possible, such as creating a column for each person with a boolean to tell whether they are included in the values
column.
from matplotlib import pyplot as plt
import pandas as pd
from datetime import datetime
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.DataFrame([['Ed, Taylor, Liv', '0:00:00', '0:00:15'],
['Ed, Liv, Peter', '0:00:15', '0:00:30'],
['Taylor, Liv, Peter', '0:00:30', '0:00:49'],
['Ed, Liv, Peter', '0:00:49', '0:01:02']],
columns=['values', 'start time', 'end time'])
df['start time'] = pd.to_datetime(df['start time'])
df['end time'] = pd.to_datetime(df['end time'])
persons_set = set(name.strip() for names in df['values'] for name in names.split(","))
persons = {p: i for i, p in enumerate(sorted(persons_set))}
print(persons)
for person in persons:
periods = []
for names, start, end in zip(df['values'], df['start time'], df['end time']):
if person in set(name.strip() for name in names.split(",")):
periods.append((start, end - start))
plt.broken_barh(periods, (persons[person] - 0.45, 0.9),
facecolors=plt.cm.plasma(persons[person] / len(persons)))
plt.yticks(range(len(persons)), persons)
plt.show()
Upvotes: 2