Reputation: 146
I am new in python and pandas and even though I researched a lot about intervals, I couldnt find any solution for my problem, I hope someone can help
Here is a sample of my DF
df = pd.DataFrame(
data=[['Mel Gibson', 'German', '2021-9-23 14:22:38', 301 ],
['Jim Carrey', 'German', '2021-9-23 14:27:39', 1041 ],
['Mel Gibson', 'German','2021-9-24 13:33:22',12]],
columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)
df['Interval Start'] = pd.to_datetime(df['Interval Start'])
What I want to do is transform status duration to 15 minute intervals and group them per specialist and per day.
My desired output should be like below:
df = pd.DataFrame(
data=[['Mel Gibson', 'German', '2021-9-23 14:15:00', 301 ],
['Jim Carrey', 'German', '2021-9-23 14:15:00', 141 ],
['Jim Carrey', 'German', '2021-9-23 14:30:00', 900 ],
['Mel Gibson', 'German','2021-9-24 13:30:00',12]],
columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)
So basically I need to split the seconds in status duration in 15 minute intervals until there is no remaining duration.
Edit:
My original data is like this:
df = pd.DataFrame(
data=[['Mel Gibson', 'German', '2021-9-23 14:22:38', 301 ],
['Mel Gibson', 'German', '2021-9-23 14:27:40', 4678 ],
['Mel Gibson', 'German','2021-9-24 13:33:22',12]],
columns=['specialist', 'Language', 'Interval Start', 'status_duration']
)
df['Interval Start'] = pd.to_datetime(df['Interval Start'])
The code from Henry gives me output for only the first row, second row is skipped.
Also lets say if a call has started at 10:35:00, this interval's(10:30-10:45) can not exceed 600 seconds as there are only 10 minutes left from the start time.
Upvotes: 4
Views: 2068
Reputation: 5301
You can use the dt.floor()
function for the rounding:
df['Interval Start'] = df['Interval Start'].dt.floor("15min")
Result (based on your edited data):
specialist Language Interval Start status_duration
0 Mel Gibson German 2021-09-23 14:15:00 301
1 Mel Gibson German 2021-09-23 14:15:00 4678
2 Mel Gibson German 2021-09-24 13:30:00 12
Then I added a column containing the number of intervals that you expect:
df['len'] = 1 + df['status_duration']//900
Result:
0 Mel Gibson German 2021-09-23 14:15:00 301 1
1 Mel Gibson German 2021-09-23 14:15:00 4678 6
2 Mel Gibson German 2021-09-24 13:30:00 12 1
You can then use numpy.repeat()
to duplicate the according rows and list comprehension with timedelta()
to build the according intervals.
import numpy as np
from datetime import timedelta
new_df = pd.DataFrame({'specialist': np.repeat(df['specialist'], df['len']),
'Language': np.repeat(df['Language'], df['len']),
'Interval Start': [el for sublist in [[x['Interval Start'] + timedelta(minutes=15*y) for y in range(0, x['len'])] for i, x in df.iterrows()] for el in sublist],
'status_duration': [el for sublist in [([900]*(x['len']-1)+[x['status_duration']%900]) for i, x in df.iterrows()] for el in sublist]
})
Result:
specialist Language Interval Start status_duration
0 Mel Gibson German 2021-09-23 14:15:00 301
1 Mel Gibson German 2021-09-23 14:15:00 900
1 Mel Gibson German 2021-09-23 14:30:00 900
1 Mel Gibson German 2021-09-23 14:45:00 900
1 Mel Gibson German 2021-09-23 15:00:00 900
1 Mel Gibson German 2021-09-23 15:15:00 900
1 Mel Gibson German 2021-09-23 15:30:00 178
2 Mel Gibson German 2021-09-24 13:30:00 12
Finally, you may want to reset the index:
new_df = new_df.reset_index(drop=True)
Result:
specialist Language Interval Start status_duration
0 Mel Gibson German 2021-09-23 14:15:00 301
1 Mel Gibson German 2021-09-23 14:15:00 900
2 Mel Gibson German 2021-09-23 14:30:00 900
3 Mel Gibson German 2021-09-23 14:45:00 900
4 Mel Gibson German 2021-09-23 15:00:00 900
5 Mel Gibson German 2021-09-23 15:15:00 900
6 Mel Gibson German 2021-09-23 15:30:00 178
7 Mel Gibson German 2021-09-24 13:30:00 12
Upvotes: 1
Reputation: 22503
One way is to make use of the quotient and remainder of status_duration
, explode
the result and finally add up the time by seconds:
ref = (df.groupby(["specialist", "Language", pd.Grouper(key="Interval Start", freq="D")], as_index=False)
.agg(status_duration=("status_duration", lambda d: [*([900]*(d.iat[0]//900)), d.iat[0]%900]),
Interval=("Interval Start", "first"))
.explode("status_duration"))
ref["Interval"] = ref["Interval"].dt.floor("15min")+pd.to_timedelta(ref.groupby(ref.index).cumcount()*900, unit="sec")
print (ref)
specialist Language status_duration Interval
0 Jim Carrey German 900 2021-09-23 14:15:00
0 Jim Carrey German 141 2021-09-23 14:30:00
1 Mel Gibson German 301 2021-09-23 14:15:00
2 Mel Gibson German 12 2021-09-24 13:30:00
Upvotes: 4