Reputation: 103

How to split pandas dataframe into groups based on index of rows

I have a dataframe and if the index of the next row is greater than 1 plus the previous index (if it goes from index 73 to 75 or anything higher, for example), I want to split these into separate dataframes. How can I achieve this?

Upvotes: 0

Answers (1)

DSM

Reputation: 353059

This can be done using a variant of the usual compare-cumsum-groupby pattern, only applied to the index instead of a column. (At least if the index is otherwise normal.) For example:

>>> df = pd.DataFrame({"A": list("abcde")}, index=[1,2,4,5,8])
>>> df
   A
1  a
2  b
4  c
5  d
8  e
>>> grouped = df.groupby((df.index.to_series().diff() > 1).cumsum())
>>> for group_id, group in grouped:
...     print("group id:", group_id)
...     print(group)
...     print()
...     
group id: 0
   A
1  a
2  b

group id: 1
   A
4  c
5  d

group id: 2
   A
8  e

You could get at the frames directly with frames = [g for k,g in grouped] or something.

This works because we can use diff to compare the jumps in the index (after converting to a Series), and then if we take the cumulative some of the bools where the difference is greater than 1, we get a growing index for each group:

>>> df.index.to_series().diff()
1   NaN
2     1
4     2
5     1
8     3
dtype: float64
>>> df.index.to_series().diff() > 1
1    False
2    False
4     True
5    False
8     True
dtype: bool
>>> (df.index.to_series().diff() > 1).cumsum()
1    0
2    0
4    1
5    1
8    2
dtype: int64

Upvotes: 4

How to split pandas dataframe into groups based on index of rows

Answers (1)

Related Questions