Reputation: 53
I have a relatively big dataset that I want to split into multiple dataframes in Python based on a column containing a datetime object. The values in the column (that I want to split the dataframe by) are given in the following format:
2015-11-01 00:00:05
You may assume the dataframe looks like this.
How can I split the dataframe into 5-second intervals in the following way:
1st dataframe 2015-11-01 00:00:00 - 2015-11-01 00:00:05
,
2nd dataframe 2015-11-01 00:00:05 - 2015-11-01 00:00:10
, and so on.
I also need to count the number of observations in each of resulting dataframes. In other, words, it would be nice if I could get another dataframe with 2 columns (the desired output format can be found below):
Upvotes: 2
Views: 337
Reputation: 862641
Create dictionary of DataFrame
s and add new column with assign
:
rng = pd.date_range('2015-11-01 00:00:00', periods=100, freq='S')
df = pd.DataFrame({'Date': rng, 'a': range(100)})
print (df.head(10))
Date a
0 2015-11-01 00:00:00 0
1 2015-11-01 00:00:01 1
2 2015-11-01 00:00:02 2
3 2015-11-01 00:00:03 3
4 2015-11-01 00:00:04 4
5 2015-11-01 00:00:05 5
6 2015-11-01 00:00:06 6
7 2015-11-01 00:00:07 7
8 2015-11-01 00:00:08 8
9 2015-11-01 00:00:09 9
g = df.groupby(pd.Grouper(key='Date', freq='5S'))
dfs = {k.strftime('%Y-%m-%d %H:%M:%S'):v.assign(A=range(1,len(v)+1), B=len(v)) for k,v in g}
print (dfs['2015-11-01 00:00:05'])
Date a A B
5 2015-11-01 00:00:05 5 1 5
6 2015-11-01 00:00:06 6 2 5
7 2015-11-01 00:00:07 7 3 5
8 2015-11-01 00:00:08 8 4 5
9 2015-11-01 00:00:09 9 5 5
If need count rows first aggreagte size
and for Interval
is add 1 to index:
df1 = df.groupby(pd.Grouper(key='Date', freq='5S')).size().reset_index(name='Count')
df1['Interval'] = df1.index + 1
print (df1.head())
Date Count Interval
0 2015-11-01 00:00:00 5 1
1 2015-11-01 00:00:05 5 2
2 2015-11-01 00:00:10 5 3
3 2015-11-01 00:00:15 5 4
4 2015-11-01 00:00:20 5 5
Upvotes: 3