Reputation: 103
I have a dataframe and if the index of the next row is greater than 1 plus the previous index (if it goes from index 73 to 75 or anything higher, for example), I want to split these into separate dataframes. How can I achieve this?
Upvotes: 0
Views: 2900
Reputation: 353059
This can be done using a variant of the usual compare-cumsum-groupby pattern, only applied to the index instead of a column. (At least if the index is otherwise normal.) For example:
>>> df = pd.DataFrame({"A": list("abcde")}, index=[1,2,4,5,8])
>>> df
A
1 a
2 b
4 c
5 d
8 e
>>> grouped = df.groupby((df.index.to_series().diff() > 1).cumsum())
>>> for group_id, group in grouped:
... print("group id:", group_id)
... print(group)
... print()
...
group id: 0
A
1 a
2 b
group id: 1
A
4 c
5 d
group id: 2
A
8 e
You could get at the frames directly with frames = [g for k,g in grouped]
or something.
This works because we can use diff
to compare the jumps in the index (after converting to a Series), and then if we take the cumulative some of the bools where the difference is greater than 1, we get a growing index for each group:
>>> df.index.to_series().diff()
1 NaN
2 1
4 2
5 1
8 3
dtype: float64
>>> df.index.to_series().diff() > 1
1 False
2 False
4 True
5 False
8 True
dtype: bool
>>> (df.index.to_series().diff() > 1).cumsum()
1 0
2 0
4 1
5 1
8 2
dtype: int64
Upvotes: 4