Reputation: 330
I have a dataframe like this one:
A C1 C2 Total
PRODUCT1 8 11 19
rs1 5 9 14
rs2 2 2 4
rs3 1 0 1
PRODUCT2 21 12 33
rs7 11 7 18
rs2 7 3 10
rs1 3 1 4
rs9 0 1 1
PRODUCT3 2 11 13
rs9 1 6 7
rs5 1 5 6
The column A is made of strings, I want to split my dataframe by the values in this column, specifically every upper word in it. Like this:
df1 =
PRODUCT1 8 11 19
rs1 5 9 14
rs2 2 2 4
rs3 1 0 1
df2 =
PRODUCT2 21 12 33
rs7 11 7 18
rs2 7 3 10
rs1 3 1 4
rs9 0 1 1
df3 =
PRODUCT3 2 11 13
rs9 1 6 7
rs5 1 5 6
Is there an easy way to achieve this?
Upvotes: 2
Views: 354
Reputation: 880369
import pandas as pd
df = pd.DataFrame({'A': ['PRODUCT1', 'rs1', 'rs2', 'rs3', 'PRODUCT2', 'rs7', 'rs2', 'rs1', 'rs9', 'PRODUCT3', 'rs9', 'rs5'], 'C1': [8, 5, 2, 1, 21, 11, 7, 3, 0, 2, 1, 1], 'C2': [11, 9, 2, 0, 12, 7, 3, 1, 1, 11, 6, 5], 'Total': [19, 14, 4, 1, 33, 18, 10, 4, 1, 13, 7, 6]})
for key, group in df.groupby(df['A'].str.isupper().cumsum()):
print(group)
prints
A C1 C2 Total
0 PRODUCT1 8 11 19
1 rs1 5 9 14
2 rs2 2 2 4
3 rs3 1 0 1
A C1 C2 Total
4 PRODUCT2 21 12 33
5 rs7 11 7 18
6 rs2 7 3 10
7 rs1 3 1 4
8 rs9 0 1 1
A C1 C2 Total
9 PRODUCT3 2 11 13
10 rs9 1 6 7
11 rs5 1 5 6
The idea here is to identify rows which are uppercase:
In [95]: df['A'].str.isupper()
Out[95]:
0 True
1 False
2 False
3 False
4 True
5 False
6 False
7 False
8 False
9 True
10 False
11 False
Name: A, dtype: bool
then use cumsum
to take a cumulative sum, where True
is treated as 1 and False
is treated as 0:
In [96]: df['A'].str.isupper().cumsum()
Out[96]:
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 2
9 3
10 3
11 3
Name: A, dtype: int64
These values can be used as group numbers. Pass them to df.groupby
to group the DataFrame according to these group numbers. df.groupby(...)
returns an iterable, which lets you loop through the sub-groups.
Upvotes: 4