Reputation: 875
I have a large textfile on my computer (location: /home/Seth/documents/bruteforce/passwords.txt) and I'm trying to find a specific string in the file. The list has one word per line and 215,000 lines/words. Does anyone know of simple Python script I can use to find a specific string?
Here's the code I have so far,
f = open("home/seth/documents/bruteforce/passwords.txt", "r")
for line in f.readlines():
line = str(line.lower())
print str(line)
if str(line) == "abe":
print "success!"
else:
print str(line)
I keep running the script, but it never finds the word in the file (and I know for sure the word is in the file).
Is there something wrong with my code? Is there a simpler method than the one I'm trying to use?
Your help is greatly appreciated.
Ps: I'm using Python 2.7 on a Debian Linux laptop.
Upvotes: 0
Views: 24461
Reputation: 16154
I'd rather use the in
keyword to look for a string in a line. Here I'm looking for the keyword 'KHANNA' in a csv file and for any such existence the code returns true.
In [121]: with open('data.csv') as f:
print any('KHANNA' in line for line in f)
.....:
True
Upvotes: 3
Reputation: 28656
What do you want to do? Just test whether the word is in the file? Here:
print 'abe' in open("passwords.txt").read().split()
Or:
print 'abe' in map(str.strip, open("passwords.txt"))
Or if it doesn't have to be Python:
egrep '^abe$' passwords.txt
EDIT: Oh, I forgot the lower
. Probably because passwords are usually case sensitive. But if it really does make sense in your case:
print 'abe' in open("passwords.txt").read().lower().split()
or
print 'abe' in (line.strip().lower() for line in open("passwords.txt"))
or
print 'abe' in map(str.lower, map(str.strip, open("passwords.txt")))
Upvotes: 1
Reputation: 36
It's just because you forgot to strip the new line char at the end of each line.
line = line.strip().lower()
would help.
Upvotes: 1
Reputation: 114035
Your script doesn't find the line because you didn't check for the newline characters:
Your file is made of many "lines". Each "line" ends with a character that you didn't account for - the newline character ('\n'
1). This is the character that creates a new line - it is what gets written to the file when you hit enter. This is how the next line is created.
So, when you read the lines out of your file, the string contained in each line actually ends with a newline character. This is why your equality test fails. You should instead, test equality against the line, after it has been stripped of this newline character:
with open("home/seth/documents/bruteforce/passwords.txt") as infile:
for line in infile:
line = line.rstrip('\n')
if line == "abe":
print 'success!'
1 Note that on some machines, the newline character is in fact two characters - the carriage return (CR), and line-feed (LF). This terminology comes from back in the day when typewriters had to jump a line-width of space on the paper that was being written to, and that the carriage that contained the paper had to be returned to its starting position. When seen in a line in the file, this appears as '\r\n'
Upvotes: 0
Reputation: 6419
Usually, when you read lines out of a file, they have a newline character at the end. Thus, they will technically not be equal to the same string without the newline character. You can get rid of this character by adding the line line=line.strip()
before the test for equality to your target string. By default, the strip() method removes all white space (such as newlines) from the string it is called on.
Upvotes: 1