maggu
maggu

Reputation: 111

Grouping pandas data frame by time intervals

I have a few questions on how to use pandas dataframes. I would like to group my table in two separate ways.

First, I'd like to group entries into certain time intervals. Let's say I want to group the following entries into 3 minute intervals. How do I do that ?

Second, I'd like to define a certain order of strings in the 4th column as one event (i.e from one 'reward' to the next one). Next, I'd like to group 5 subsequent events together.

Does that make any sense? Anyone could help me? I guess if one knows pandas, it'd be a piece of cake.

Thanks a lot and all the best.

4914  2015-03-31  19:56:34    trialIdle    1     0    0
4915  2015-03-31  19:56:36    trialIdle    1     0    0
4916  2015-03-31  19:56:39    trialIdle    1     0    0
4917  2015-03-31  19:56:39    trialIdle    1     1    0
4918  2015-03-31  19:56:51    trialIdle    1     0    0
4919  2015-03-31  19:56:51       reward    0     0    0
4920  2015-03-31  19:56:58  trialTimout    1     1    0
4921  2015-03-31  19:57:06    trialIdle    1     1    0
4922  2015-03-31  19:57:09    trialIdle    1     1    0
4923  2015-03-31  19:57:09    trialIdle    1     1    0
4924  2015-03-31  19:57:12    trialIdle    1     1    0
4925  2015-03-31  19:57:12    trialIdle    1     1    0
4926  2015-03-31  19:57:12       reward    0     0    0
4927  2015-03-31  19:57:16  trialTimout    1     1    0
4928  2015-03-31  19:57:31    trialIdle    1     1    0
4929  2015-03-31  19:57:44    trialIdle    1     0    0
4930  2015-03-31  19:57:45    trialIdle    1     0    0
4931  2015-03-31  19:58:54    trialIdle    1     0    0
4932  2015-03-31  19:59:56    trialIdle    1     1    0
4933  2015-03-31  19:59:56       reward    0     0    0
4934  2015-03-31  19:59:57  trialTimout    1     1    0
4935  2015-03-31  20:00:02  trialTimout    1     0    0
4936  2015-03-31  20:00:05    trialIdle    1     0    0
4937  2015-03-31  20:00:18    trialIdle    1     0    0
4938  2015-03-31  20:00:19    trialIdle    1     1    0
4939  2015-03-31  20:00:40    trialIdle    1     1    0
4940  2015-03-31  20:00:51    trialIdle    1     0    0
4941  2015-03-31  20:00:51       reward    0     0    0

Upvotes: 1

Views: 1663

Answers (1)

Mark Graph
Mark Graph

Reputation: 5121

It is a bit hard to know how your data is structured from the dump you provided. I will assume the date and time information represents one column in your DataFrame, which is stored as a series of pandas Timestamps. If not, you will need to get it into this format. For me, this is in column 1 ...

In [15]: print(df.head())
                         1           2  3  4  5
0                                              
4914   2015-03-31 19:56:34   trialIdle  1  0  0
4915   2015-03-31 19:56:36   trialIdle  1  0  0
4916   2015-03-31 19:56:39   trialIdle  1  0  0
4917   2015-03-31 19:56:39   trialIdle  1  1  0
4918   2015-03-31 19:56:51   trialIdle  1  0  0

Next, we make the index the timestamp ...

In [16]: df.index = pd.DatetimeIndex(df[1])

[Edit] Make a new column for each category in column 2

In [17]: for col in df[2].unique():
   ....:     df[col] = (df[2] == col).astype(int)

Then we resample

In [18]: dfrs = df.resample('3min', how=sum)

And let's see what we have got

In [19]: print(dfrs)
                      3  4  5  trialIdle  reward  trialTimout
2015-03-31 19:54:00   6  2  0          5       1            1
2015-03-31 19:57:00  12  9  0         10       2            2
2015-03-31 20:00:00   6  2  0          5       1            1

[Edit] In terms of grouping events together - the following code looks at the reward line and all subsequent lines up until (and excluding) the next reward line. I have called this a reward group (rwg).

df['rwg'] = np.nan
df.loc[df[2]=='reward', 'rwg'] = range(1, sum(df[2]=='reward')+1 )
df['rwg'] = df['rwg'].ffill()   # bfill() if you want to group the other way
df['rwg'] = df['rwg'].fillna(0) # the initial group without a preceding reward
dfrwg = df.groupby('rwg').sum()
print(dfrwg)

Note: if you want to group the lines leading up to and including a reward-line, use bfill() rather than ffill() above.

The ffill() approach yields

In [119]: print(dfrwg)
     3  4  5  trialIdle  reward  trialTimout
rwg                                         
0    5  1  0          5       0            0
1    6  6  0          5       1            1
2    6  3  0          5       1            1
3    7  3  0          5       1            2
4    0  0  0          0       1            0

Upvotes: 1

Related Questions