Pandas truncate DataFrame after a column condition is met

Question

So I have the following DataFrame df:

enter image description here

The frame contains two groups of data that are sorted within that group.

Group 1 is from index 359 to 365 inclusive

Group 2 is from index 366 to 371 inclusive

I want to separate them into the two groups. There may be more than two groups. The logic I am applying is whenever the next STEPS_ID is less than the current STEPS_ID, this marks the end of the group.

I am easily able to get this pointer by df.STEPS_ID <= df.STEPS_ID.shift(-1)

Is there an elegant pandas way to achieve this easily possibly using vectorized operations rather than for loop?

This seems to be a common enough problem that I am sure there must be a well-defined algorithm to solve these kinds of problems. I would also appreciate if you guys could guide me in reading up on the theoretical basis for such algorithms.

unutbu · Accepted Answer

There is more than one way to "separate things into groups". One way would be to make a list of groups. But that is not the ideal way when dealing with a Pandas DataFrame. Once you have a list, you are forced to loop over the list in a Python loop. Those are comparatively slow compared to native Pandas operations.

Assuming you have enough memory, a better way would be to add an column or index to the DataFrame:

import pandas as pd
df = pd.DataFrame({'STEPS_ID':range(1107,1113)*2})
df['GROUP'] = (df['STEPS_ID'] < df['STEPS_ID'].shift(1)).astype('int').cumsum()
# df.set_index('GROUP', inplace=True, append=True)
print(df)

yields

    STEPS_ID  GROUP
0       1107      0
1       1108      0
2       1109      0
3       1110      0
4       1111      0
5       1112      0
6       1107      1
7       1108      1
8       1109      1
9       1110      1
10      1111      1
11      1112      1

Now you can do aggregation/transformation operations on each group by calling

df.groupby('GROUP')....

Pandas truncate DataFrame after a column condition is met

Answers (1)

Related Questions