Python Convert Strings in a List to Integers Based on Order

Question

Given this list of strings:

list=['foo','foo','foo','bar','bar','baz','baz','baz']

I'd like to get a list of the corresponding numbers as if this were an index with tied ranks like this:

numbers=[0,0,0,1,1,2,2,2]

Thanks in advance!

ShadowRanger · Accepted Answer

Assuming the strings are already grouped (all repeated strings are consecutive), the lowest overhead way to do this is with itertools.groupby

from itertools import groupby

numbers = [i for i, (_, g) in enumerate(groupby(mylist)) for _ in g]

This just groups the entries in mylist (list is a terrible name for a variable, shadowing the list constructor), and produces i (the 0-up count of groups seen so far) once for each entry in the group (we don't even care what the values are, thus for _ in g to indicate the _ is unimportant).

If repeated values might be non-consecutive, but should have the same group number (that is, ['d', 'f', 'd'] might occur, and should produce [0, 1, 0] rather than [0, 1, 2]), you'd use a different approach (which would also work with the consecutive only case, but requires persistent and growing state that the groupby approach avoids):

from collections import defaultdict
from itertools import count

# If key seen already, returns value, otherwise, returns next unused integer group number
grouptracker = defaultdict(count().__next__)  # .next on Py2

numbers = [grouptracker[x] for x in mylist]

Or to one-line it for fun and inscrutability (don't actually do this):

numbers = list(map(defaultdict(count().__next__).__getitem__, mylist))

Python Convert Strings in a List to Integers Based on Order

Answers (1)

Related Questions