Reputation: 3309
I'm trying to split a pandas data frame by the repeating ranges in column A. My data and output are as follows. The ranges in columns A are always increasing and do not skip values. The values in column A do start and stop arbitrarily, however.
import pandas as pd
dict = {"A": [1,2,3,2,3,4,3,4,5,6],
"B": ["a","b","c","d","e","f","g","h","i","k"]}
df = pd.DataFrame(dict)
df
A B
0 1 a
1 2 b
2 3 c
3 2 d
4 3 e
5 4 f
6 3 g
7 4 h
8 5 i
9 6 k
df1
A B
0 1 a
1 2 b
2 3 c
df2
A B
0 2 d
1 3 e
2 4 f
df3
A B
0 3 g
1 4 h
2 5 i
3 6 k
Thanks for advice!
from timeit import default_timer as timer
start = timer()
for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
print(y)
end = timer()
aa = end - start
start = timer()
s = (df.A.diff() != 1).cumsum()
g = df.groupby(s)
for _,g_ in g:
print(g_)
end = timer()
bb = end - start
start = timer()
[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]
print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')
end = timer()
cc = end - start
print(aa,bb,cc)
0.0176649530000077 0.018132143000002543 0.018715283999995336
Upvotes: 3
Views: 397
Reputation: 294488
because that's important
[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]
print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')
A B
0 1 a
1 2 b
2 3 c
A B
3 2 d
4 3 e
5 4 f
A B
6 3 g
7 4 h
8 5 i
9 6 k
df1, df2, df3 = (d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))
Upvotes: 2
Reputation: 59274
Just groupby
by the difference
s = (df.A.diff() != 1).cumsum()
g = df.groupby(s)
for _,g_ in g:
print(g_)
Outputs
A B
0 1 a
1 2 b
2 3 c
A B
3 2 d
4 3 e
5 4 f
A B
6 3 g
7 4 h
8 5 i
9 6 k
Upvotes: 3
Reputation: 323326
Create the groupby
key by using diff
and cumsum
for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
print(y)
A B
0 1 a
1 2 b
2 3 c
A B
3 2 d
4 3 e
5 4 f
A B
6 3 g
7 4 h
8 5 i
9 6 k
Upvotes: 3