Reputation: 863
I have the following pandas dataframe (just an example):
import pandas as pd
df = pd.DataFrame(pd.Series(['a','a','a','b','b','c','c','c','c','b','c','a']), columns = ['Data'])
Data
0 a
1 a
2 a
3 b
4 b
5 c
6 c
7 c
8 c
9 b
10 c
11 a
The goal is to get another column, Stats, that count the element of Data column as following:
Data Stats
0 a
1 a
2 a a3
3 b
4 b b2
5 c
6 c
7 c
8 c c4
9 b b1
10 c c1
11 a a1
Where, for example, a3 means "three consecutive a elements", c4 means "four consecutive c elements" and so on...
Thank you in advance for your help
Upvotes: 3
Views: 1208
Reputation: 862406
Create helper Series
s
for consecutive values of column Data
, get count per groups by GroupBy.transform
and last repalce duplicated values to empty string
s:
s = df['Data'].ne(df['Data'].shift()).cumsum()
a = df.groupby(s)['Data'].transform('size')
df['Stats'] = np.where(~s.duplicated(keep='last'), df['Data'] + a.astype(str), '')
print (df)
Data Stats
0 a
1 a
2 a a3
3 b
4 b b2
5 c
6 c
7 c
8 c c4
9 b b1
10 c c1
11 a a1
Detail:
print (s)
0 1
1 1
2 1
3 2
4 2
5 3
6 3
7 3
8 3
9 4
10 5
11 6
Name: Data, dtype: int32
print (a)
0 3
1 3
2 3
3 2
4 2
5 4
6 4
7 4
8 4
9 1
10 1
11 1
Name: Data, dtype: int64
Without removing duplicates solution is simplier:
df['Stats'] = df['Data'] + a.astype(str)
print (df)
Data Stats
0 a a3
1 a a3
2 a a3
3 b b2
4 b b2
5 c c4
6 c c4
7 c c4
8 c c4
9 b b1
10 c c1
11 a a1
Upvotes: 1
Reputation: 164613
Here's one way using groupby
:
counts = df.groupby((df['Data'] != df['Data'].shift()).cumsum()).cumcount() + 1
df['Stats'] = np.where(df['Data'] != df['Data'].shift(-1),
df['Data'] + counts.astype(str), '')
print(df)
Data Stats
0 a
1 a
2 a a3
3 b
4 b b2
5 c
6 c
7 c
8 c c4
9 b b1
10 c c1
11 a a1
Upvotes: 2