mikeyy
mikeyy

Reputation: 161

Python - Iterating through a large list and putting in queue

I have the for loop code:

people = queue.Queue()
for person in set(list_):
    first_name,last_name = re.split(',| | ',person)
    people.put([first_name,last_name])

The list being iterated has 1,000,000+ items, it works, but takes a couple seconds to complete.

What changes can I make to help the processing speed?

Edit: I should add that this is Gevent's queue library

Upvotes: 2

Views: 2253

Answers (4)

Blender
Blender

Reputation: 298156

I would try replacing regex with something a bit less intense:

first_name, last_name = person.split(', ')

Upvotes: 0

David K. Hess
David K. Hess

Reputation: 17246

The question is what is your queue being used for? If it isn't really necessary for threading purposes (or you can work around the threaded access) in this kind of situation, you want to switch to generators - you can think of them as the Python version of Unix shell pipes. So, your loop would look like:

def generate_people(list_):
    previous_row = None
    for person in sorted(list_):
        if person == previous_row:
            continue
        first_name,last_name = re.split(',| | ',person)
        yield [first_name,last_name]
        previous_row = person

and you would use this generator like this:

for first_name, last_name in generate_people():
    print first_name, last_name

This approach avoids what is probably your biggest performance hits - allocating memory to build a queue and a set with 1,000,000+ items on it. This approach works with one pair of strings at a time.

UPDATE

Based on more information about how threads play a roll in this, I'd use this solution instead.

people = queue.Queue()
previous_row = None
for person in sorted(list_):
    if person == previous_row:
        continue
    first_name,last_name = re.split(',| | ',person)
    people.put([first_name,last_name])
    previous_row = person

This replaces the set() operation with something that should be more efficient.

Upvotes: 1

ttyunix
ttyunix

Reputation: 1

I think you can use multi-threading reading data,and the queue concurrent queue.

Upvotes: 0

Matt Joiner
Matt Joiner

Reputation: 118500

with people.mutex:
    people.queue.extend(list(re.split(',| | ',person)) for person in set(list_))
    people.not_empty.notify_all()

Note that this completely ignores the queue capacity, but avoids lots of excessive locking.

Upvotes: 1

Related Questions