Dodge
Dodge

Reputation: 3309

Separate pandas df by repeating range in a column

Problem:

I'm trying to split a pandas data frame by the repeating ranges in column A. My data and output are as follows. The ranges in columns A are always increasing and do not skip values. The values in column A do start and stop arbitrarily, however.

Data:

import pandas as pd

dict = {"A": [1,2,3,2,3,4,3,4,5,6],
        "B": ["a","b","c","d","e","f","g","h","i","k"]}

df = pd.DataFrame(dict)

df

   A  B
0  1  a
1  2  b
2  3  c
3  2  d
4  3  e
5  4  f
6  3  g
7  4  h
8  5  i
9  6  k

Desired ouptut:

df1

   A  B
0  1  a
1  2  b
2  3  c

df2

   A  B
0  2  d
1  3  e
2  4  f

df3

   A  B
0  3  g
1  4  h
2  5  i
3  6  k

Thanks for advice!

Answer times:

from timeit import default_timer as timer

start = timer()
for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
    print(y)
end = timer()
aa = end - start

start = timer()
s = (df.A.diff() != 1).cumsum()
g = df.groupby(s) 
for _,g_ in g:
    print(g_)
end = timer()
bb = end - start

start = timer()
[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]
print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')
end = timer()
cc = end - start

print(aa,bb,cc)

0.0176649530000077 0.018132143000002543 0.018715283999995336

Upvotes: 3

Views: 397

Answers (3)

piRSquared
piRSquared

Reputation: 294488

One-liner

because that's important

[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]

Print it

print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')

   A  B
0  1  a
1  2  b
2  3  c

   A  B
3  2  d
4  3  e
5  4  f

   A  B
6  3  g
7  4  h
8  5  i
9  6  k

Assign it

df1, df2, df3 = (d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))

Upvotes: 2

rafaelc
rafaelc

Reputation: 59274

Just groupby by the difference

s = (df.A.diff() != 1).cumsum()
g = df.groupby(s)

for _,g_ in g:
    print(g_)

Outputs

   A  B
0  1  a
1  2  b
2  3  c

   A  B
3  2  d
4  3  e
5  4  f

   A  B
6  3  g
7  4  h
8  5  i
9  6  k

Upvotes: 3

BENY
BENY

Reputation: 323326

Create the groupby key by using diff and cumsum

for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
    print(y)

   A  B
0  1  a
1  2  b
2  3  c
   A  B
3  2  d
4  3  e
5  4  f
   A  B
6  3  g
7  4  h
8  5  i
9  6  k

Upvotes: 3

Related Questions