Reputation: 161
I have the for loop code:
people = queue.Queue()
for person in set(list_):
first_name,last_name = re.split(',| | ',person)
people.put([first_name,last_name])
The list being iterated has 1,000,000+ items, it works, but takes a couple seconds to complete.
What changes can I make to help the processing speed?
Edit: I should add that this is Gevent's queue library
Upvotes: 2
Views: 2253
Reputation: 298156
I would try replacing regex with something a bit less intense:
first_name, last_name = person.split(', ')
Upvotes: 0
Reputation: 17246
The question is what is your queue being used for? If it isn't really necessary for threading purposes (or you can work around the threaded access) in this kind of situation, you want to switch to generators - you can think of them as the Python version of Unix shell pipes. So, your loop would look like:
def generate_people(list_):
previous_row = None
for person in sorted(list_):
if person == previous_row:
continue
first_name,last_name = re.split(',| | ',person)
yield [first_name,last_name]
previous_row = person
and you would use this generator like this:
for first_name, last_name in generate_people():
print first_name, last_name
This approach avoids what is probably your biggest performance hits - allocating memory to build a queue and a set with 1,000,000+ items on it. This approach works with one pair of strings at a time.
UPDATE
Based on more information about how threads play a roll in this, I'd use this solution instead.
people = queue.Queue()
previous_row = None
for person in sorted(list_):
if person == previous_row:
continue
first_name,last_name = re.split(',| | ',person)
people.put([first_name,last_name])
previous_row = person
This replaces the set() operation with something that should be more efficient.
Upvotes: 1
Reputation: 1
I think you can use multi-threading reading data,and the queue concurrent queue.
Upvotes: 0
Reputation: 118500
with people.mutex:
people.queue.extend(list(re.split(',| | ',person)) for person in set(list_))
people.not_empty.notify_all()
Note that this completely ignores the queue capacity, but avoids lots of excessive locking.
Upvotes: 1