Roelland
Roelland

Reputation: 193

Fast way to check if a string is in a huge text file

I'm looking for an easy way to check if all the strings that are in a list are in a huge text file (>35.000 words).

self.vierkant = ['BIT', 'ICE', 'TEN']


def geldig(self, file):
    self.file = file
    file = open(self.file, 'r')
    line = file.readline()
    self.file = ''

    while line:
        line = line.strip('\n')
        self.file += line
        line = file.readline()

    return len([woord for woord in self.vierkant if woord.lower() not in self.file]) == 0

I just copy the text file into self.file, then check if all words from self.vierkant are in self.file.

The main problem is that it takes a very long time to read in the text file. Is there an easier/faster way to do this?

Upvotes: 2

Views: 2533

Answers (2)

宏杰李
宏杰李

Reputation: 12178

with open('a.txt') as f:
    s = set(f.read().splitlines())  # splitlines will remove the '\n' in the end and return a list of line.
for line in test_lines:
    line in s  # O(1) check if the the line in the line-set

Upvotes: 0

Eugene Yarmash
Eugene Yarmash

Reputation: 150225

You can read the entire contents of a file with file.read() instead of calling readline() repeatedly and concatenating the result:

with open(self.file) as f:
    self.file = f.read()

If you need to check a lot of words, you could also build a set from the file's contents for O(1) containment checks.

Upvotes: 2

Related Questions