Reputation: 91
I would like to group rows in a pandas dataframe based on the difference between rows. Given the following dataframe
zz = pd.DataFrame([[1,0], [1.1, 2], [2,3], [2.19,4], [5,7], [6,0], [7,2], [8,3], [8.05, 0], [8.12,4]], columns = ['a', 'b'])
I would like to form groups when the difference between values in column 'a' is less than 0.2. So, the following groups (as a dataframegroup object) would emerge (indices of the rows in brackets) for this dataframe:
I looked around but I could find an easy solution.
Upvotes: 0
Views: 26
Reputation: 862581
For starting groups by 1
use cumulative sum with invert mask after comapre for less or equal:
zz['groups'] = (~zz.a.diff().lt(0.2)).cumsum()
print (zz)
a b groups
0 1.00 0 1
1 1.10 2 1
2 2.00 3 2
3 2.19 4 2
4 5.00 7 3
5 6.00 0 4
6 7.00 2 5
7 8.00 3 6
8 8.05 0 6
9 8.12 4 6
Upvotes: 0
Reputation: 260455
Start a new group when the value is above (or equal) 0.2, use cumsum
to propagate the group:
zz.groupby(zz['a'].diff().ge(0.2).cumsum())
Intermediate:
zz['group'] = zz['a'].diff().ge(0.2).cumsum() #.add(1) # if you want to start with 1
a b group
0 1.00 0 0
1 1.10 2 0
2 2.00 3 1
3 2.19 4 1
4 5.00 7 2
5 6.00 0 3
6 7.00 2 4
7 8.00 3 5
8 8.05 0 5
9 8.12 4 5
Upvotes: 1