Ted Vu
Ted Vu

Reputation: 149

Label elements in a duplicate list

Is there any good way to do this:

    input  = ['hi you', 'hello', 'hi you', 'hello', 'good bye']
    output = [1, 2, 1, 2, 3] 

Many thanks!!

( I just edited the input list. Instead of alphabet list my case actually is the new list)

Upvotes: 2

Views: 222

Answers (4)

Seyi Daniel
Seyi Daniel

Reputation: 2379

You can get it done this way:

idx_dict, result, counter = {}, [], 1 #idx_dict stores first index of every unique value
for i in input1:
    if i not in idx_dict: #stores the first index of every unique value in idx_dict 
        idx_dict[i] = counter
        counter += 1
    result.append(idx_dict[i]) #for every value encountered get its first index from the idx_dict and append to result list

This basically solves the problem in 'n' iterations

Upvotes: 0

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96277

The most time efficient way would be to build a mapping from the values to the first encountered index:

>>> data = ['a', 'b', 'a', 'b', 'c']
>>> index = {}
>>> for x in data:
...     if x not in index:
...         index[x] = len(index) + 1
...
>>> index
{'a': 1, 'b': 2, 'c': 3}

Then simply map the original data:

>>> [index[x] for x in data]
[1, 2, 1, 2, 3]

Upvotes: 1

mousetail
mousetail

Reputation: 8010

The ord() function gives the unicode value of a character. For example, ord('a') == 97.

In unicode, as well as most other character encoding, normal letters are stored in order. Thus, you can get the index of any other letter by simply subtracting ord('a'), for example: ord('b') - ord('a') == 1 and ord('z') - ord('a') == 25. Of course you can add one to get a 1 based index.

Using this knowledge, we can build a comprehension that does what you want:

output = [ord(i) - ord('a') + 1 for i in input]

This will give the desired results for your example input. However, if your string contains any capital letters or simbols, results might be strange. For example ord('A') == 65 so if your string contains a capital A it will be replaced by -31. If you want to treat capital letters the same use:

output = [ord(i.lower()) - ord('a') + 1 for i in input]

Upvotes: 1

mousetail
mousetail

Reputation: 8010

You could solve it like this:

output = [input.index(i) for i in input]

Every value in output will be the first index of the value at that index in input. If you want arrays to start at one use:

output = [input.index(i) + 1 for i in input]

(Though you probably want to avoid using built-in functions like input for variable names)

Upvotes: 1

Related Questions