How can I create a Dataframe with indexs of different Period

Question

I have a Dataframe, each row represent a record produced by pbs. Now I want the know the running cores in each time period(30 minite). The first 4 rows of my table:

datetime    walltime    ncores
2019-07-18 11:18:27 2:05:10     2
2019-07-18 11:18:45 00:50:27    1
2019-07-18 11:18:46 00:07:20    1
2019-07-18 11:18:50 00:31:34    1

I find it's impossible to make a PeriodIndex with elements of Peroid(The used-walltime in each record are inconsistent).

I thought I can create a PeriodIndex with freq of 30 minutes, then assign the number of cores of all records within a certian Period to the correspinding Period. But I can't figure out how to do this.

What I expected is :

    datetime cputime    ncores
    2019-07-18 11:0:00      5
    2019-07-18 11:30:00     4
    2019-07-18 12:00:00     3
    2019-07-18 12:30:00     2

jezrael · Accepted Answer

I think you need:

#convert to datetimes and timedeltas
df['datetime'] = pd.to_datetime(df['datetime'])
df['walltime'] = pd.to_timedelta(df['walltime'])

#create end time with flooring by 30min
df['end'] = df['datetime'].dt.floor('30min') + df['walltime']

#list by 30minutes period
zipped = zip(df['datetime'], df['end'], df['ncores'])
L = [(i, n) for s, e, n in zipped for i in pd.period_range(s, e, freq='30min')]

#DataFrame is aggregated by sum
df1 = (pd.DataFrame(L, columns=['datetime cputime', 'summed'])
        .groupby('datetime cputime', as_index=False)['summed']
        .sum())
print (df1)
   datetime cputime  summed
0  2019-07-18 11:00       5
1  2019-07-18 11:30       4
2  2019-07-18 12:00       3
3  2019-07-18 12:30       2
4  2019-07-18 13:00       2

How can I create a Dataframe with indexs of different Period

Answers (1)

Related Questions