LarsVegas
LarsVegas

Reputation: 6832

How to loop over three different data collections effectively in Python?

What would be the best/fastest way to accomplish the following: I have a big file I need to update. I won't load it into memory but read it line by lien like this.

with open(self.f, "rb") as f_in:
        for line in f_in:
            l = line.strip().split(',')

For each line there are potentially two different update scenarios. That is, two large lists/tuples with the update information. For each line I have to check if, let's say l[0], is meeting a condition in list one, if not check for another condition in list two. I am wondering what would be wise here as I am potentially running into performance issues. My first idea was to delete the item from the list/tuple if it was matched so the list becomes smaller and smaller with runtime.

Upvotes: 0

Views: 70

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123400

To test for membership against a series of values, use a set instead of a list.

Like dictionary lookups, set membership tests are O(1) operations. Cheap, independent of the size of the set.

set_one = {'some_value', 'some_other_value', ...}

# ...
if l[0] in set_one:
    # do something.

If you needed to map values, use a dictionary:

dict_one = {'some_value': 'item1', 'some_other_value': 'item2', ...}

# ...
if l[0] in dict_one:
    item = dict_one[l[0]]

This all depends on exactly what kind of lookups you are trying to do; many different kinds of lookups can be made very efficient with the right data structures. Looping over large lists for every line in a file is usually not the best option.

Upvotes: 5

Related Questions