Reputation: 317
I wrote the following function to add indexes to duplicates in a series:
(["foo", "foo", "foo", "bar", "bar"]
becomes ["foo 1", "foo 2", "foo 3", "bar 1", "bar 2"]
)
def indexer(series):
all_labels = []
for title in set(series):
label = []
i = 0
while i < len(series):
if title == series.iloc[i]:
label.append(title)
i += 1
all_labels.append(label)
final = []
for item in all_labels:
if len(item) > 1:
for i, label in enumerate(item):
final.append(label + " " + str(i+1))
else:
final.append(item[0])
return final
There is obviously a better and cleaner way to do this, probably using Pandas groupby and agg (although I'm not sure how they behave with a single series instead of df). Would someone please shed some light on how to do it? Thanks
Upvotes: 1
Views: 741
Reputation: 8940
If single appearance needs to be left alone.
['foo', 'foo', 'foo', 'bar', 'bar', 'John']
mylist = list(df)
m = map(lambda x: x[1]+ " " + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))
m = list(m)
df = pd.Series(m)
df
Output:
0 foo 1
1 foo 2
2 foo 3
3 bar 1
4 bar 2
5 John
dtype: object
John didn't get any number with him. Hurray!
Upvotes: 2
Reputation: 45752
If it's a DataFrame you can use groupby
to find a cumulative count which is the label you want to concatenate to all your strings, and note the groups don't have to be in order:
df = pd.DataFrame(["foo", "foo", "bar", "bar", "foo"], columns=["baz"])
labels = df.groupby("baz").cumcount() + 1
df["baz"] + " " + labels.astype(str)
which results in
0 foo 1
1 foo 2
2 bar 1
3 bar 2
4 foo 3
dtype: object
However this will also add the 1
label to any unique values. Did you want those to remain unchanged? I assumed not since you're starting the others at 1
instead of leaving the first in each group unchanged.
Upvotes: 3