Reputation: 1792
I have the following information dataframe that logs quiz attempts in a given alloted period and also how long each takes. Each student is given a 25 minute period to complete a test and they can complete as many as they want.
my data is:
# Import pandas library
import pandas as pd
import numpy as np
# data
data = [['tom', 1,1,1,'0:00','10:26']
,['tom', 1,1,2,'15:30','18:50']
,['tom', 1,2,1,'2:00','9:15']
,['tom', 1,2,2,'13:10','22:40']
,['tom', 2,1,1,'5:00','22:15']
,['tom', 2,2,1,'0:00','13:40']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Day','AllottedPeriod','AttemptNo','StartTime','EndTime'])
df
And what I am trying to do is create a simple visual something like the following to visualise diffences in attempt timings of students. I'm not quite sure how to go about it but does anyone have any ideas how to achieve this? Thanks very much!
Upvotes: 0
Views: 280
Reputation: 1357
Let me know if I misunderstood something (for example, if you only want to include the first attempt)
Time ranges are grouped by 5 minutes.
05 == 1
if test was active anytime between 00:00 & 5:0010 == 1
if test was active anytime between 05:00 & 10:0015 == 1
if test was active anytime between 10:00 & 15:0020 == 1
if test was active anytime between 15:00 & 20:0025 == 1
if test was active anytime between 20:00 & 25:00import pandas as pd
import numpy as np
data = [
['tom', 1, 1, 1, '0:00', '10:26'],
['tom', 1, 1, 2, '15:30', '18:50'],
['tom', 1, 2, 1, '2:00', '9:15'],
['tom', 1, 2, 2, '13:10', '22:40'],
['tom', 2, 1, 1, '5:00', '22:15'],
['tom', 2, 2, 1, '0:00', '13:40']
]
# Create the pandas DataFrame
df = pd.DataFrame(
columns=['Name',
'Day',
'AllottedPeriod',
'AttemptNo',
'StartTime',
'EndTime'],
data=data,
)
def parse_time_periods(x):
start_minute, start_second = x['StartTime'].split(':')
end_minute, end_second = x['EndTime'].split(':')
# calculate the start and end time in seconds
start = (int(start_minute) * 60) + int(start_second)
end = (int(end_minute) * 60) + int(end_second)
test_range = range(start, end)
for i in range(5, 26, 5):
# create range to check for intercection with testing time
time_range = range((i - 5) * 60, i * 60)
# create variables to indicate if there is overlap between
# test time and minute range. For example, if a test was active
# between minute 10 and minute 15, column `15` will be 1
if len(set(test_range).intersection(time_range)) > 0:
x[f'{i:02}'] = 1
return x
df = df.apply(lambda x: parse_time_periods(x), axis=1).fillna(0)
stacked_df = df.groupby(
by=['Name', 'Day', 'AttemptNo', 'AllottedPeriod']
).agg(max).unstack().swaplevel(0, 1, 1).sort_index(1)
display(
stacked_df
.style
.format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
.background_gradient(cmap='RdYlGn')
)
If you want to drop the StartTime
and EndTime
columns, you can modify the code like this
stacked_df = df.drop(columns=['StartTime', 'EndTime']).groupby(
by=['Name', 'Day', 'AttemptNo', 'AllottedPeriod']
).agg(max).unstack().swaplevel(0, 1, 1).sort_index(1)
display(
stacked_df
.style
.format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
.background_gradient(cmap='RdYlGn')
)
If you wanted to change the time ranges, e.g. grouping by 1 minute, not 5, you could modify the code like this:
def parse_time_periods(x):
start_minute, start_second = x['StartTime'].split(':')
end_minute, end_second = x['EndTime'].split(':')
# calculate the start and end time in seconds
start = (int(start_minute) * 60) + int(start_second)
end = (int(end_minute) * 60) + int(end_second)
test_range = range(start, end)
for i in range(1, 26, 1):
# create range to check for intercection with testing time
time_range = range((i - 1) * 60, i * 60)
if len(set(test_range).intersection(time_range)) > 0:
x[f'{i:02}'] = 1
return x
If you wanted to drop the second attempt, you could do something like this:
df = df[df.AttemptNo == 1].drop(columns=['AttemptNo'])
stacked_df = df.groupby(
by=['Name', 'Day', 'AllottedPeriod']
).agg(max).unstack().swaplevel(0, 1, 1).sort_index(1)
display(
stacked_df
.style
.format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
.background_gradient(cmap='RdYlGn')
)
If you wanted to include both attempts, you can use this:
stacked_df = df.drop(columns=['StartTime', 'EndTime','AttemptNo']).groupby(
by=['Name', 'Day', 'AllottedPeriod']
).agg(sum).unstack().swaplevel(0, 1, 1).sort_index(1)
display(
stacked_df
.style
.format(formatter='{:0,.0f}', subset=stacked_df.select_dtypes('float').columns)
.background_gradient(cmap='RdYlGn',axis=1)
)
Upvotes: 1