user1917152
user1917152

Reputation: 47

How to make this code more efficient in Python?

I'm having trouble running this nested for loop efficiently. I need to run this loops on a string s whose length is about 90,000. Can anyone provide any tips?

This code is supposed to take a string, and chop it up into pieces n sizes long such that the pieces are a continuous part of the original string. The program then returns the size of each set for n up to the length of the string.

For example: GATTACAT for n = 2 would produce {'GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT' }. It would take the set of this so {'GA', 'AT', 'TT', 'TA', 'AC', 'CA'} and return its length.

The program is to do this from n = 0 to n = len('GATTACAT'), and sum all set lengths.

for m in range(1, len(s)+1):
    sublist = list()
    for n in range(0, len(s)-m+1):
        sublist.append(''.join(ind[n:n+m]))
    sumS += len(set(sublist))

thanks!

Upvotes: 1

Views: 297

Answers (2)

kojiro
kojiro

Reputation: 77167

Some quick ideas come to mind:

slen = 1 + len(s) # do this once, not a bunch of times in the loop
for m in range(1, slen):
    sublist = [''.join(ind[n:n+m]) for n in range(slen-m))] # list comps are usually faster than loops
    sumS += len(set(sublist))

Actually you can probably do it as a larger comprehension:

slen = 1 + len(s)
sumS += sum(len(set(''.join(ind[n:n+m]) for n in range(slen-m))) for m in range(1,slen))

If you have Python 3 use a set comprehension instead of the list comprehension above.

Upvotes: 2

user2665694
user2665694

Reputation:

>>> s = 'GATTACAT'

>>> [s[i:i+2] for i in range(len(s)-1)]
['GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT']

>>> [s[i:i+3] for i in range(len(s)-2)]
['GAT', 'ATT', 'TTA', 'TAC', 'ACA', 'CAT']

Upvotes: 0

Related Questions