Reputation: 1698
I was wondering if you could help me to speed up my python script.
I have two lists:
a=['a','b','c','d','e','f','g','h','i','j']
b=['b','f','g','j']
I want to create a list that will contain elements of b, but will have a length of a, with elements not in b replaced by something else, let's say '-999'
. Also, instead of having the actual elements (a,b,c...) I want to substitute that with the element's index from b. So it would look like that:
c=['-999',0,'-999','-999','-999', 1, 2,'-999','-999',3]
My code for now is:
c=[]
counter=0
for each in a:
if each in b:
c.append(counter)
counter+=1
else:
c.append('-999')
It works fine, however, in real life my list a is 600 000 elements long, and there are actually 7 b lists that I need to iterate them over, all between 3k and 250k elements as well.
Any ideas on how to speed this up?
Upvotes: 2
Views: 2697
Reputation: 250931
If the elements in b
are unique then you can try this:
In [76]: a=['a','b','c','d','e','f','g','h','i','j']
In [77]: b=['b','f','g','j']
In [78]: dic={x:i for i,x in enumerate(b)}
In [79]: dic
Out[79]: {'b': 0, 'f': 1, 'g': 2, 'j': 3}
In [81]: [dic.get(x,'-999') for x in a]
Out[81]: ['-999', 0, '-999', '-999', '-999', 1, 2, '-999', '-999', 3]
For repeated items you can use defaultdict(list)
:
In [102]: a=['a','b','c','d','e','f','g','b','h','i','f','j']
In [103]: b=['b','f','g','j','b','f']
In [104]: dic=defaultdict(list)
In [105]: for i,x in enumerate(b):
dic[x].append(i)
.....:
#now convert every value(i.e list) present in dic to an iterator.
In [106]: dic={x:iter(y) for x,y in dic.items()}
In [107]: [next(dic[x]) if x in dic else '-999' for x in a] #call next() if the key
#is present else use '-999'
Out[107]: ['-999', 0, '-999', '-999', '-999', 1, 2, 4, '-999', '-999', 5, 3]
Upvotes: 6
Reputation: 35950
Something more simpler:
a=['a','b','c','d','e','f','g','h','i','j']
b=['b','f','g','j']
for i,x in enumerate(a):
a[i] = b.index(x) if x in b else -999
Output:
[-999, 0, -999, -999, -999, 1, 2, -999, -999, 3]
Analysis:
OP's method:
>>>
len(a) = 10000
len(b) = 5000
Time = 0:00:01.063000
Method 1:
c=[]
for i,x in enumerate(a):
c.append(b.index(x) if x in b else -999)
>>>
len(a) = 10000
len(b) = 5000
Time = 0:00:01.109000
Ashwini Chaudhary method:
>>>
len(a) = 10000
len(b) = 5000
Time = 0:00:00
Upvotes: 0