Reputation:
Suppose I have a list of elements:
my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']
and I want to convert this list to a list of indexes of unique elements.
So CatA
is assigned to index 0, CatB
to index 1 and CatC
to index 2.
My desired result would be:
result = [0, 1, 2, 0, 0, 2]
Currently I'm doing this by creating a dictionary that assigns to each element it's unique id
and then using a list comprehension to create the final list of indexes:
unique_classes = np.unique(my_list)
conversion_dict = dict(unique_classes, range(len(unique_classes))
result = [conversion_dict[i] for i in my_list]
My question is: Is there an easier and straightforward way of doing this?
I am thinking about having a big list of categories so it needs to be efficient but preventing me to manually create the unique list, the dictionary and the list comprehension.
Upvotes: 4
Views: 1081
Reputation: 164693
As suggested by @mikey, you can use np.unique
, as below:
import numpy as np
my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']
res = np.unique(my_list, return_inverse=True)[1]
Result:
[0 1 2 0 0 2]
Upvotes: 3
Reputation: 3469
You can do this by using Label encoder from scikit learn.It will assign labels to each unique values in a list.
Example code :
from sklearn.preprocessing import LabelEncoder
my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']
le = LabelEncoder()
print(le.fit(my_list).transform(my_list))
Upvotes: 1
Reputation: 519
result = [my_list.index(l) for l in my_list]
print(result)
[0, 1, 2, 0, 0, 2]
list.index() returns the index of first occurrence as required for your task.
For more details check list.index()
Upvotes: 0
Reputation: 23504
This will do the trick:
my_list = ['CatA', 'CatB', 'CatC', 'CatA', 'CatA', 'CatC']
first_occurances = dict()
result = []
for i, v in enumerate(my_list):
try:
index = first_occurances[v]
except KeyError:
index = i
first_occurances[v] = i
result.append(index)
Complexity will be O(n).
Basically what you do is storing in dict
indexes of first value occurance. If first_occurances
don't have value v
, then we save current index i
.
Upvotes: 2