Reputation: 6832
What would be the best/fastest way to accomplish the following: I have a big file I need to update. I won't load it into memory but read it line by lien like this.
with open(self.f, "rb") as f_in:
for line in f_in:
l = line.strip().split(',')
For each line there are potentially two different update scenarios. That is, two large lists/tuples with the update information. For each line I have to check if, let's say l[0]
, is meeting a condition in list one, if not check for another condition in list two. I am wondering what would be wise here as I am potentially running into performance issues. My first idea was to delete the item from the list/tuple if it was matched so the list becomes smaller and smaller with runtime.
Upvotes: 0
Views: 70
Reputation: 1123400
To test for membership against a series of values, use a set
instead of a list.
Like dictionary lookups, set membership tests are O(1) operations. Cheap, independent of the size of the set.
set_one = {'some_value', 'some_other_value', ...}
# ...
if l[0] in set_one:
# do something.
If you needed to map values, use a dictionary:
dict_one = {'some_value': 'item1', 'some_other_value': 'item2', ...}
# ...
if l[0] in dict_one:
item = dict_one[l[0]]
This all depends on exactly what kind of lookups you are trying to do; many different kinds of lookups can be made very efficient with the right data structures. Looping over large lists for every line in a file is usually not the best option.
Upvotes: 5