Daniel Garavito
Daniel Garavito

Reputation: 87

How to auto increment counter by repeteaded values in a column

I Have a data frame with the column name and I need to create the column seq, which allows me identify the different times that a name appears in the data frame, it's important to preserve the order.

import pandas as pd  
  
data = {'name': ['Tom', 'Joseph','Joseph','Joseph', 'Tom', 'Tom', 'John','Tom','Tom','John','Joseph']
        , 'seq': ['Tom 0', 'Joseph 0','Joseph 0','Joseph 0', 'Tom 1', 'Tom 1', 'John 0','Tom 2','Tom 2','John 1','Joseph 1']}  
  
df = pd.DataFrame(data)  
  
print(df) 

      name       seq
0      Tom     Tom 0
1   Joseph  Joseph 0
2   Joseph  Joseph 0
3   Joseph  Joseph 0
4      Tom     Tom 1
5      Tom     Tom 1
6     John    John 0
7      Tom     Tom 2
8      Tom     Tom 2
9     John    John 1
10  Joseph  Joseph 1

Upvotes: 2

Views: 508

Answers (3)

ansev
ansev

Reputation: 30920

You need check for the existence of a new name and then create a new index for each name using groupby and cumsum, the resulting string Series can be concatenated with str.cat

df['seq'] = df['name'].str.cat(
    df['name'].ne(df['name'].shift()).groupby(df['name']).cumsum().sub(1).astype(str),
    sep=' '
)

Upvotes: 1

Corralien
Corralien

Reputation: 120391

Create a boolean mask to know if the name has changed from the previous row. Then filter out the second, third, ... names of a sequence before grouping by name. cumcount increment the sequence number and finally concatenate name and sequence number.

# Boolean mask
m = df['name'].ne(df['name'].shift())

# Create sequence number
seq = df.loc[m].groupby('name').cumcount().astype(str) \
        .reindex(df.index, fill_value=pd.NA).ffill()

# Concatenate name and seq
df['seq'] = df['name'] + ' ' + seq

Output:

>>> df
      name       seq
0      Tom     Tom 0
1   Joseph  Joseph 0
2   Joseph  Joseph 0
3   Joseph  Joseph 0
4      Tom     Tom 1
5      Tom     Tom 1
6     John    John 0
7      Tom     Tom 2
8      Tom     Tom 2
9     John    John 1
10  Joseph  Joseph 1

>>> m
0      True
1      True
2     False
3     False
4      True
5     False
6      True
7      True
8     False
9      True
10     True
Name: name, dtype: bool

Upvotes: 2

Code Different
Code Different

Reputation: 93141

Assuming your data frame is indexes sequentiallly (0, 1, 2, 3, ...):

  1. Group the data frame by name
  2. For each group, apply a gap-and-island algorithm: every time the index jumps by more than 1, create a new island
def sequencer(group):
    idx = group.index.to_series()
    # Every time the index has a gap >1, create a new island
    return idx.diff().ne(1).cumsum().sub(1)

seq = df.groupby('name').apply(sequencer).droplevel(0).rename('seq')
df.merge(seq, left_index=True, right_index=True)

Upvotes: 0

Related Questions