Reputation: 1114
I have a data frame with two columns.
data=DataFrame({'a':[1,2,1,4,1,1,3,1,4,1,1,1],'b':[5,2,8,3,10,3,5,15,45,41,23,9]})
a b
0 1 5
1 2 2
2 1 8
3 4 3
4 1 10
5 1 3
6 3 5
7 1 15
8 4 45
9 1 41
10 1 23
11 1 9
I want to partition the DataFrame into groups based on the value of Column 'a' exceeding a given threshold. in this case the threshold is 3, so I want a new group whenever column 'a' exceeds the value of 3.
I want my groups to look like this
a b
0 1 5
1 2 2
2 1 8
a b
0 4 3
1 1 10
2 1 3
3 3 5
4 1 15
a b
0 4 45
1 1 41
2 1 23
3 1 9
The only thing I can think of is applying cummax
data['max']=data.b.cummax()
but his will only works in limited situations where the values greater than 3 are always getting bigger.
Upvotes: 1
Views: 553
Reputation: 3961
Example:
I:
a = np.array([[1,2,3], [4,5,6]])
print(a)
O:
array([[1, 2, 3],
[4, 5, 6]])
I:
np.cumsum(a)
O:
array([ 1, 3, 6, 10, 15, 21])
Upvotes: 0
Reputation: 36623
You can create a MultiIndex based on the values of a
and apply it to the data frame. This will allow you to perform slicing operations on the data later.
There may be a faster way to do this, but you can create a boolean series of each time a
is greater than a threshold (in this case 3), use cumsum
to create an index for each group, count the occurrences in each group, and use that count to create an additional level to the index.
new_index = ((data.a>3).cumsum().value_counts(sort=False)
.apply(range).apply(pd.Series).stack().index)
data2 = data.set_index(new_index)
data2
# returns:
a b
0 0 1 5
1 2 2
2 1 8
1 0 4 3
1 1 10
2 1 3
3 3 5
4 1 15
2 0 4 45
1 1 41
2 1 23
3 1 9
This allows you to access each of the desired groups by calling the first index.
data2.loc[0]
# returns:
a b
0 1 5
1 2 2
2 1 8
Upvotes: 0
Reputation: 214957
You can use (data.a > 3).cumsum
; This creates the desired group variable you need:
threshold = 3
for k, g in data.groupby((data.a > threshold).cumsum()):
print(g, "\n")
# a b
#0 1 5
#1 2 2
#2 1 8
# a b
#3 4 3
#4 1 10
#5 1 3
#6 3 5
#7 1 15
# a b
#8 4 45
#9 1 41
#10 1 23
#11 1 9
Upvotes: 2