Reputation: 47
I'm having trouble running this nested for loop efficiently. I need to run this loops on a string s whose length is about 90,000. Can anyone provide any tips?
This code is supposed to take a string, and chop it up into pieces n sizes long such that the pieces are a continuous part of the original string. The program then returns the size of each set for n up to the length of the string.
For example: GATTACAT for n = 2 would produce {'GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT' }. It would take the set of this so {'GA', 'AT', 'TT', 'TA', 'AC', 'CA'} and return its length.
The program is to do this from n = 0 to n = len('GATTACAT'), and sum all set lengths.
for m in range(1, len(s)+1):
sublist = list()
for n in range(0, len(s)-m+1):
sublist.append(''.join(ind[n:n+m]))
sumS += len(set(sublist))
thanks!
Upvotes: 1
Views: 297
Reputation: 77167
Some quick ideas come to mind:
slen = 1 + len(s) # do this once, not a bunch of times in the loop
for m in range(1, slen):
sublist = [''.join(ind[n:n+m]) for n in range(slen-m))] # list comps are usually faster than loops
sumS += len(set(sublist))
Actually you can probably do it as a larger comprehension:
slen = 1 + len(s)
sumS += sum(len(set(''.join(ind[n:n+m]) for n in range(slen-m))) for m in range(1,slen))
If you have Python 3 use a set comprehension instead of the list comprehension above.
Upvotes: 2
Reputation:
>>> s = 'GATTACAT'
>>> [s[i:i+2] for i in range(len(s)-1)]
['GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT']
>>> [s[i:i+3] for i in range(len(s)-2)]
['GAT', 'ATT', 'TTA', 'TAC', 'ACA', 'CAT']
Upvotes: 0