user2261062
user2261062

Reputation:

Map elements to list of unique indexes

Suppose I have a list of elements:

my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']

and I want to convert this list to a list of indexes of unique elements.

So CatA is assigned to index 0, CatB to index 1 and CatC to index 2.

My desired result would be:

result = [0, 1, 2, 0, 0, 2]

Currently I'm doing this by creating a dictionary that assigns to each element it's unique id and then using a list comprehension to create the final list of indexes:

unique_classes = np.unique(my_list)
conversion_dict = dict(unique_classes, range(len(unique_classes))
result = [conversion_dict[i] for i in my_list]

My question is: Is there an easier and straightforward way of doing this?

I am thinking about having a big list of categories so it needs to be efficient but preventing me to manually create the unique list, the dictionary and the list comprehension.

Upvotes: 4

Views: 1081

Answers (4)

jpp
jpp

Reputation: 164693

As suggested by @mikey, you can use np.unique, as below:

import numpy as np

my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']

res = np.unique(my_list, return_inverse=True)[1]

Result:

[0 1 2 0 0 2]

Upvotes: 3

Jay Dangar
Jay Dangar

Reputation: 3469

You can do this by using Label encoder from scikit learn.It will assign labels to each unique values in a list.

Example code :

from sklearn.preprocessing import LabelEncoder
my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']
le = LabelEncoder()
print(le.fit(my_list).transform(my_list))

Upvotes: 1

Geek
Geek

Reputation: 519

result = [my_list.index(l) for l in my_list]
print(result)
[0, 1, 2, 0, 0, 2]

list.index() returns the index of first occurrence as required for your task.

For more details check list.index()

Upvotes: 0

vishes_shell
vishes_shell

Reputation: 23504

This will do the trick:

my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']
first_occurances = dict()
result = []

for i, v in enumerate(my_list):
    try:
        index = first_occurances[v]
    except KeyError:
        index = i
        first_occurances[v] = i
    result.append(index)

Complexity will be O(n).

Basically what you do is storing in dict indexes of first value occurance. If first_occurances don't have value v, then we save current index i.

Upvotes: 2

Related Questions