Reputation: 5299
I have dataframe below
A B
1 a0
2 b0
3 b1
4 a1
5 b2
6 a2
First,I would like to cut df
.with startswith("a")
df1
A B
1 a0
2 b0
3 b1
df2
A B
4 a1
5 b2
df3
A B
6 a2
And I would like to count the rows. And summarize the result.
My desired result is below
rows
a0 3
a1 2
a2 1
How this can be done?
Upvotes: 1
Views: 125
Reputation: 214957
You can convert cells not starting with a
to missing values and forward fill the series and then do value_counts
:
df.B.where(df.B.str.startswith("a"), None).ffill().value_counts()
#a0 3
#a1 2
#a2 1
#Name: B, dtype: int64
If you have duplicated a
s appear, to differentiate them, you can create an additional group variable with cumsum
:
start_a = df.B.str.startswith("a")
df.groupby(by = [df.B.where(start_a, None).ffill(), start_a.cumsum().rename('g')]).size()
#B g # here is an extra group variable to differentiate possible duplicated a rows
#a0 1 3
#a1 2 2
#a2 3 1
#dtype: int64
Upvotes: 2