Reputation: 111
I have a few questions on how to use pandas dataframes. I would like to group my table in two separate ways.
First, I'd like to group entries into certain time intervals. Let's say I want to group the following entries into 3 minute intervals. How do I do that ?
Second, I'd like to define a certain order of strings in the 4th column as one event (i.e from one 'reward' to the next one). Next, I'd like to group 5 subsequent events together.
Does that make any sense? Anyone could help me? I guess if one knows pandas, it'd be a piece of cake.
Thanks a lot and all the best.
4914 2015-03-31 19:56:34 trialIdle 1 0 0 4915 2015-03-31 19:56:36 trialIdle 1 0 0 4916 2015-03-31 19:56:39 trialIdle 1 0 0 4917 2015-03-31 19:56:39 trialIdle 1 1 0 4918 2015-03-31 19:56:51 trialIdle 1 0 0 4919 2015-03-31 19:56:51 reward 0 0 0 4920 2015-03-31 19:56:58 trialTimout 1 1 0 4921 2015-03-31 19:57:06 trialIdle 1 1 0 4922 2015-03-31 19:57:09 trialIdle 1 1 0 4923 2015-03-31 19:57:09 trialIdle 1 1 0 4924 2015-03-31 19:57:12 trialIdle 1 1 0 4925 2015-03-31 19:57:12 trialIdle 1 1 0 4926 2015-03-31 19:57:12 reward 0 0 0 4927 2015-03-31 19:57:16 trialTimout 1 1 0 4928 2015-03-31 19:57:31 trialIdle 1 1 0 4929 2015-03-31 19:57:44 trialIdle 1 0 0 4930 2015-03-31 19:57:45 trialIdle 1 0 0 4931 2015-03-31 19:58:54 trialIdle 1 0 0 4932 2015-03-31 19:59:56 trialIdle 1 1 0 4933 2015-03-31 19:59:56 reward 0 0 0 4934 2015-03-31 19:59:57 trialTimout 1 1 0 4935 2015-03-31 20:00:02 trialTimout 1 0 0 4936 2015-03-31 20:00:05 trialIdle 1 0 0 4937 2015-03-31 20:00:18 trialIdle 1 0 0 4938 2015-03-31 20:00:19 trialIdle 1 1 0 4939 2015-03-31 20:00:40 trialIdle 1 1 0 4940 2015-03-31 20:00:51 trialIdle 1 0 0 4941 2015-03-31 20:00:51 reward 0 0 0
Upvotes: 1
Views: 1663
Reputation: 5121
It is a bit hard to know how your data is structured from the dump you provided. I will assume the date and time information represents one column in your DataFrame, which is stored as a series of pandas Timestamps. If not, you will need to get it into this format. For me, this is in column 1 ...
In [15]: print(df.head())
1 2 3 4 5
0
4914 2015-03-31 19:56:34 trialIdle 1 0 0
4915 2015-03-31 19:56:36 trialIdle 1 0 0
4916 2015-03-31 19:56:39 trialIdle 1 0 0
4917 2015-03-31 19:56:39 trialIdle 1 1 0
4918 2015-03-31 19:56:51 trialIdle 1 0 0
Next, we make the index the timestamp ...
In [16]: df.index = pd.DatetimeIndex(df[1])
[Edit] Make a new column for each category in column 2
In [17]: for col in df[2].unique():
....: df[col] = (df[2] == col).astype(int)
Then we resample
In [18]: dfrs = df.resample('3min', how=sum)
And let's see what we have got
In [19]: print(dfrs)
3 4 5 trialIdle reward trialTimout
2015-03-31 19:54:00 6 2 0 5 1 1
2015-03-31 19:57:00 12 9 0 10 2 2
2015-03-31 20:00:00 6 2 0 5 1 1
[Edit] In terms of grouping events together - the following code looks at the reward line and all subsequent lines up until (and excluding) the next reward line. I have called this a reward group (rwg).
df['rwg'] = np.nan
df.loc[df[2]=='reward', 'rwg'] = range(1, sum(df[2]=='reward')+1 )
df['rwg'] = df['rwg'].ffill() # bfill() if you want to group the other way
df['rwg'] = df['rwg'].fillna(0) # the initial group without a preceding reward
dfrwg = df.groupby('rwg').sum()
print(dfrwg)
Note: if you want to group the lines leading up to and including a reward-line, use bfill() rather than ffill() above.
The ffill() approach yields
In [119]: print(dfrwg)
3 4 5 trialIdle reward trialTimout
rwg
0 5 1 0 5 0 0
1 6 6 0 5 1 1
2 6 3 0 5 1 1
3 7 3 0 5 1 2
4 0 0 0 0 1 0
Upvotes: 1