Reputation: 5875
I needed a python function which would take a list of strings in the form of:
seq = ['A[0]','B[2:5]','A[4]']
and return a new list of "expanded" elements with preserved order, like so:
expanded = ['A[0]', 'B[2]', 'B[3]', 'B[4]', 'B[5]', 'A[4]']
To achieve my goal I wrote this simple function:
def expand_seq(seq):
#['item[i]' for item in seq for xrange in item]
return ['%s[%s]'%(item.split('[')[0],i) for item in seq for i in xrange(int(item.split('[')[-1][:-1].split(':')[0]),int(item.split('[')[-1][:-1].split(':')[-1])+1)]
When dealing with a sequence which would generate less than 500k items it works well, but it slows down quite a bit when generating very large lists (more 1 million). For example:
# let's generate 10 million items!
seq = ['A[1:5000000]','B[1:5000000]']
t1 = time.clock()
seq = expand_seq(seq)
t2 = time.clock()
print round(t2-t1, 3)
# RESULT: 9.541 seconds
I'm looking for ways to improve this function and hopefully speed it up when dealing with large lists. If anyone has suggestions, I would love to hear them!
Upvotes: 0
Views: 108
Reputation: 20005
The following seems to give a 35% speedup:
import re
r = re.compile(r"(\w+)\[(\d+)(?::(\d+))?\]")
def expand_seq(seq):
result = []
for item in seq:
m = r.match(item)
name, start, end = m.group(1), int(m.group(2)), m.group(3)
rng = xrange(start, int(end)) if end else (start,)
t = name + "["
result.extend(t + str(i) + "]" for i in rng)
return result
With this code:
Upvotes: 2
Reputation: 6684
I'm not sure you'll get a dramatic speedup since you can't fundamentally improve the algorithm. I did get about a 20% speedup from yours by doing it this way:
def expand_seq(seq):
expanded = []
for s in seq:
name, indices = s[0:-1].split("[")
if ":" in indices:
index1, index2 = [int(i) for i in indices.split(":")]
else:
index1 = int(indices)
index2 = index1
for n in range(index1, index2 + 1):
expanded.append("{}[{}]".format(name, n))
return expanded
I think the speedup is largely due to not repeating some operations (like int
and split
) which you had to do to keep your solution to a one-liner.
As has been suggested though, if you use a generator you can start consuming the results instantly. Like this:
def expand_seq(seq):
for s in seq:
name, indices = s[0:-1].split("[")
if ":" in indices:
index1, index2 = [int(i) for i in indices.split(":")]
else:
index1 = int(indices)
index2 = index1
for n in range(index1, index2 + 1):
yield "{}[{}]".format(name, n)
Upvotes: 0