Reputation: 23
I have a very long list of 195 different integers, but they range from 0 to 2399. For example the number 90 occurs many times, while the number 7 doesn't show up at all.
list = [90, 110, 113, 88, 90, 110, 90, 1370, 90]
I would like to 'tokenise' this, or turn it into a list of integers ranging from 0 to 195, while keeping the unique ID of the different values.
Basically, I'd like this output:
new_list = [1, 2, 3, 4, 1, 2, 1, 5, 1]
The goal is to be able to efficiently iterate over the list.
Upvotes: 2
Views: 71
Reputation: 113940
d={}
new_list = [d[i] for i in values if d.setdefault(i,len(d)+1)]
Upvotes: 4
Reputation: 77837
As @cricket_007, I question your application. Iteration doesn't vary with the magnitude of the number. However, if you have reason to need a dense set of IDs, then this is one possible solution. I've left the building loop simple to let you see how it works; there are some Pythonic improvements you could make, such as using the dictionary get method.
Build a dictionary to translate old to new IDs. Then do the translation in one fell swoop.
my_list = [90, 110, 113, 88, 90, 110, 90, 1370, 90]
new_id_dict = {}
new_id = 0
for id in my_list:
if id not in new_id_dict:
new_id += 1
new_id_dict[id] = new_id
new_list = [new_id_dict[id] for id in my_list]
print new_list
Output:
[1, 2, 3, 4, 1, 2, 1, 5, 1]
Upvotes: 0