Reputation: 807
I have a Dataframe, each row represent a record produced by pbs. Now I want the know the running cores in each time period(30 minite). The first 4 rows of my table:
datetime walltime ncores
2019-07-18 11:18:27 2:05:10 2
2019-07-18 11:18:45 00:50:27 1
2019-07-18 11:18:46 00:07:20 1
2019-07-18 11:18:50 00:31:34 1
I find it's impossible to make a PeriodIndex
with elements of Peroid
(The used-walltime in each record are inconsistent).
I thought I can create a PeriodIndex
with freq of 30 minutes
, then assign the number of cores of all records within a certian Period
to the correspinding Period
. But I can't figure out how to do this.
What I expected is :
datetime cputime ncores
2019-07-18 11:0:00 5
2019-07-18 11:30:00 4
2019-07-18 12:00:00 3
2019-07-18 12:30:00 2
Upvotes: 1
Views: 43
Reputation: 863226
I think you need:
#convert to datetimes and timedeltas
df['datetime'] = pd.to_datetime(df['datetime'])
df['walltime'] = pd.to_timedelta(df['walltime'])
#create end time with flooring by 30min
df['end'] = df['datetime'].dt.floor('30min') + df['walltime']
#list by 30minutes period
zipped = zip(df['datetime'], df['end'], df['ncores'])
L = [(i, n) for s, e, n in zipped for i in pd.period_range(s, e, freq='30min')]
#DataFrame is aggregated by sum
df1 = (pd.DataFrame(L, columns=['datetime cputime', 'summed'])
.groupby('datetime cputime', as_index=False)['summed']
.sum())
print (df1)
datetime cputime summed
0 2019-07-18 11:00 5
1 2019-07-18 11:30 4
2 2019-07-18 12:00 3
3 2019-07-18 12:30 2
4 2019-07-18 13:00 2
Upvotes: 1