Split the data frame based on consecutive row values differences

Question

I have a data frame like this,

df
col1    col2    col3
 1        2      3
 2        5      6
 7        8      9
10       11     12
11       12     13
13       14     15
14       15     16

Now I want to create multiple data frames from above when the col1 difference of two consecutive rows are more than 1. So the result data frames will look like,

df1
col1    col2    col3
 1        2      3
 2        5      6
df2
col1    col2    col3
 7        8      9
df3
col1    col2    col3
10       11     12
11       12     13
df4
col1    col2    col3
13       14     15
14       15     16

I can do this using for loop and storing the indices but this will increase execution time, looking for some pandas shortcuts or pythonic way to do this most efficiently.

yatu · Accepted Answer

You could define a custom grouper by taking the diff, checking when it is greater than 1, and take the cumsum of the boolean series. Then group by the result and build a dictionary from the groupby object:

d = dict(tuple(df.groupby(df.col1.diff().gt(1).cumsum())))

print(d[0])
   col1  col2  col3
0     1     2     3
1     2     5     6

print(d[1])
   col1  col2  col3
2     7     8     9

A more detailed break-down:

df.assign(difference=(diff:=df.col1.diff()), 
          condition=(gt1:=diff.gt(1)), 
          grouper=gt1.cumsum())

   col1  col2  col3  difference  condition  grouper
0     1     2     3         NaN      False        0
1     2     5     6         1.0      False        0
2     7     8     9         5.0       True        1
3    10    11    12         3.0       True        2
4    11    12    13         1.0      False        2
5    13    14    15         2.0       True        3
6    14    15    16         1.0      False        3

Split the data frame based on consecutive row values differences

Answers (2)

Related Questions