branwen85
branwen85

Reputation: 1698

How can I speed up an iteration in python?

I was wondering if you could help me to speed up my python script.

I have two lists:

a=['a','b','c','d','e','f','g','h','i','j']

b=['b','f','g','j']

I want to create a list that will contain elements of b, but will have a length of a, with elements not in b replaced by something else, let's say '-999'. Also, instead of having the actual elements (a,b,c...) I want to substitute that with the element's index from b. So it would look like that:

c=['-999',0,'-999','-999','-999', 1, 2,'-999','-999',3] 

My code for now is:

c=[]

counter=0

for each in a:
    if each in b:
        c.append(counter)
        counter+=1
    else:
        c.append('-999')

It works fine, however, in real life my list a is 600 000 elements long, and there are actually 7 b lists that I need to iterate them over, all between 3k and 250k elements as well.

Any ideas on how to speed this up?

Upvotes: 2

Views: 2697

Answers (2)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250931

If the elements in b are unique then you can try this:

In [76]: a=['a','b','c','d','e','f','g','h','i','j']

In [77]: b=['b','f','g','j']

In [78]: dic={x:i for i,x in enumerate(b)}

In [79]: dic
Out[79]: {'b': 0, 'f': 1, 'g': 2, 'j': 3}

In [81]: [dic.get(x,'-999') for x in a]
Out[81]: ['-999', 0, '-999', '-999', '-999', 1, 2, '-999', '-999', 3]

For repeated items you can use defaultdict(list):

In [102]: a=['a','b','c','d','e','f','g','b','h','i','f','j']

In [103]: b=['b','f','g','j','b','f']

In [104]: dic=defaultdict(list)

In [105]: for i,x in enumerate(b):
    dic[x].append(i)
   .....:     

#now convert every value(i.e list) present in dic to an iterator.

In [106]: dic={x:iter(y) for x,y in dic.items()}  

In [107]: [next(dic[x]) if x in dic else '-999' for x in a]  #call next() if the key 
                                                             #is present else use '-999'
Out[107]: ['-999', 0, '-999', '-999', '-999', 1, 2, 4, '-999', '-999', 5, 3]

Upvotes: 6

ATOzTOA
ATOzTOA

Reputation: 35950

Something more simpler:

a=['a','b','c','d','e','f','g','h','i','j']

b=['b','f','g','j']

for i,x in enumerate(a):
    a[i] = b.index(x) if x in b else -999

Output:

[-999, 0, -999, -999, -999, 1, 2, -999, -999, 3]

Analysis:

OP's method:

>>> 
len(a) = 10000
len(b) = 5000
Time = 0:00:01.063000

Method 1:

c=[]
for i,x in enumerate(a):
    c.append(b.index(x) if x in b else -999)

>>> 
len(a) = 10000
len(b) = 5000
Time = 0:00:01.109000

Ashwini Chaudhary method:

>>> 
len(a) = 10000
len(b) = 5000
Time = 0:00:00

Upvotes: 0

Related Questions