Splitting pandas dataframe into many chunks

Question

Let's say I have a dataframe with the following structure:

    observation
d1  1
d2  1
d3  -1
d4  -1
d5  -1
d6  -1
d7  1
d8  1
d9  1
d10 1
d11 -1
d12 -1
d13 -1  
d14 -1
d15 -1
d16 1
d17 1
d18 1
d19 1
d20 1

Where d1:d20 is some datetime index (generalized here).

If I wanted to split d1:d2, d3:d6, d7:d10, etc into their own respective "chunks", how would I do that pythonically?

Note:

df1 = df[(df.observation==1)]
df2 = df[(df.observation==-1)]

is not what I want.

I can think of brute force ways, which would work, but are not wildly elegant.

akuiper · Accepted Answer

You can create a group variable based on the cumsum() of the diff() of the observation column where if the diff() is not equal to zero, assign a True value, thus every time a new value appears, a new group id will be created with the cumsum(), and then you can either apply standard analysis after groupby() with df.groupby((df.observation.diff() != 0).cumsum())...(other chained analysis here) or split them into smaller data frames with list-comprehension:

lst = [g for _, g in df.groupby((df.observation.diff() != 0).cumsum())]

lst[0]
# observation
#d1         1
#d2         1

lst[1]
# observation
#d3        -1
#d4        -1
#d5        -1
#d6        -1
...

Index chunks here:

[i.index for i in lst]

#[Index(['d1', 'd2'], dtype='object'),
# Index(['d3', 'd4', 'd5', 'd6'], dtype='object'),
# Index(['d7', 'd8', 'd9', 'd10'], dtype='object'),
# Index(['d11', 'd12', 'd13', 'd14', 'd15'], dtype='object'),
# Index(['d16', 'd17', 'd18', 'd19', 'd20'], dtype='object')]

Splitting pandas dataframe into many chunks

Answers (2)

Related Questions