How to create a time chart display in python pandas

Question

I have the following information dataframe that logs quiz attempts in a given alloted period and also how long each takes. Each student is given a 25 minute period to complete a test and they can complete as many as they want.

my data is:

# Import pandas library 
import pandas as pd
import numpy as np

# data
data = [['tom', 1,1,1,'0:00','10:26']
        ,['tom', 1,1,2,'15:30','18:50']
        ,['tom', 1,2,1,'2:00','9:15']
        ,['tom', 1,2,2,'13:10','22:40']
        ,['tom', 2,1,1,'5:00','22:15']
        ,['tom', 2,2,1,'0:00','13:40']]

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Name', 'Day','AllottedPeriod','AttemptNo','StartTime','EndTime']) 
df

And what I am trying to do is create a simple visual something like the following to visualise diffences in attempt timings of students. I'm not quite sure how to go about it but does anyone have any ideas how to achieve this? Thanks very much!

kait · Accepted Answer

Assumptions:

Let me know if I misunderstood something (for example, if you only want to include the first attempt)

There is a row for each combination of Name, Day, AllottedPeriod, and AttemptNo.

Time ranges are grouped by 5 minutes.

05 == 1 if test was active anytime between 00:00 & 5:00
10 == 1 if test was active anytime between 05:00 & 10:00
15 == 1 if test was active anytime between 10:00 & 15:00
20 == 1 if test was active anytime between 15:00 & 20:00
25 == 1 if test was active anytime between 20:00 & 25:00

import pandas as pd
import numpy as np

data = [
    ['tom', 1, 1, 1, '0:00', '10:26'],
    ['tom', 1, 1, 2, '15:30', '18:50'],
    ['tom', 1, 2, 1, '2:00', '9:15'],
    ['tom', 1, 2, 2, '13:10', '22:40'],
    ['tom', 2, 1, 1, '5:00', '22:15'],
    ['tom', 2, 2, 1, '0:00', '13:40']
]
# Create the pandas DataFrame
df = pd.DataFrame(
    columns=['Name',
             'Day',
             'AllottedPeriod',
             'AttemptNo',
             'StartTime',
             'EndTime'],
    data=data,
)


def parse_time_periods(x):
    start_minute, start_second = x['StartTime'].split(':')
    end_minute, end_second = x['EndTime'].split(':')
    # calculate the start and end time in seconds
    start = (int(start_minute) * 60) + int(start_second)
    end = (int(end_minute) * 60) + int(end_second)
    test_range = range(start, end)
    for i in range(5, 26, 5):
        # create range to check for intercection with testing time
        time_range = range((i - 5) * 60, i * 60)
        # create variables to indicate if there is overlap between
        # test time and minute range. For example, if a test was active
        # between minute 10 and minute 15, column `15` will be 1
        if len(set(test_range).intersection(time_range)) > 0:
            x[f'{i:02}'] = 1
    return x


df = df.apply(lambda x: parse_time_periods(x), axis=1).fillna(0)

stacked_df = df.groupby(
    by=['Name', 'Day', 'AttemptNo', 'AllottedPeriod']
).agg(max).unstack().swaplevel(0, 1, 1).sort_index(1)

display(
    stacked_df
    .style
    .format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
    .background_gradient(cmap='RdYlGn')
)

End result

If you want to drop the StartTime and EndTime columns, you can modify the code like this

stacked_df = df.drop(columns=['StartTime', 'EndTime']).groupby(
    by=['Name', 'Day', 'AttemptNo', 'AllottedPeriod']
).agg(max).unstack().swaplevel(0, 1, 1).sort_index(1)

display(
    stacked_df
    .style
    .format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
    .background_gradient(cmap='RdYlGn')
)

If you wanted to change the time ranges, e.g. grouping by 1 minute, not 5, you could modify the code like this:

def parse_time_periods(x):
    start_minute, start_second = x['StartTime'].split(':')
    end_minute, end_second = x['EndTime'].split(':')
    # calculate the start and end time in seconds
    start = (int(start_minute) * 60) + int(start_second)
    end = (int(end_minute) * 60) + int(end_second)
    test_range = range(start, end)
    for i in range(1, 26, 1):
        # create range to check for intercection with testing time
        time_range = range((i - 1) * 60, i * 60)
        if len(set(test_range).intersection(time_range)) > 0:
            x[f'{i:02}'] = 1
    return x

If you wanted to drop the second attempt, you could do something like this:

df = df[df.AttemptNo == 1].drop(columns=['AttemptNo'])
stacked_df = df.groupby(
    by=['Name', 'Day', 'AllottedPeriod']
).agg(max).unstack().swaplevel(0, 1, 1).sort_index(1)

display(
    stacked_df
    .style
    .format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
    .background_gradient(cmap='RdYlGn')
)

If you wanted to include both attempts, you can use this:

stacked_df = df.drop(columns=['StartTime', 'EndTime','AttemptNo']).groupby(
    by=['Name', 'Day', 'AllottedPeriod']
).agg(sum).unstack().swaplevel(0, 1, 1).sort_index(1)

display(
    stacked_df
    .style
    .format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
    .background_gradient(cmap='RdYlGn',axis=1)
)

How to create a time chart display in python pandas

Answers (1)

Assumptions:

End result

Related Questions