Add index to duplicated items in Pandas Series

Question

I wrote the following function to add indexes to duplicates in a series:

(["foo", "foo", "foo", "bar", "bar"] becomes ["foo 1", "foo 2", "foo 3", "bar 1", "bar 2"])

def indexer(series):
  all_labels = []
  for title in set(series): 
    label = []
    i = 0
    while i < len(series): 
      if title == series.iloc[i]:
        label.append(title)
      i += 1
    all_labels.append(label)
  final = []
  for item in all_labels:
    if len(item) > 1:
      for i, label in enumerate(item):
        final.append(label + " " + str(i+1))
    else:
      final.append(item[0])
  return final

There is obviously a better and cleaner way to do this, probably using Pandas groupby and agg (although I'm not sure how they behave with a single series instead of df). Would someone please shed some light on how to do it? Thanks

Dan · Accepted Answer

If it's a DataFrame you can use groupby to find a cumulative count which is the label you want to concatenate to all your strings, and note the groups don't have to be in order:

df = pd.DataFrame(["foo", "foo", "bar", "bar", "foo"], columns=["baz"])
labels = df.groupby("baz").cumcount() + 1
df["baz"] + " " + labels.astype(str)

which results in

0    foo 1
1    foo 2
2    bar 1
3    bar 2
4    foo 3
dtype: object

However this will also add the 1 label to any unique values. Did you want those to remain unchanged? I assumed not since you're starting the others at 1 instead of leaving the first in each group unchanged.

Add index to duplicated items in Pandas Series

Answers (2)

Related Questions