Reputation: 45
r = ","
x = ""
output = list()
import string
def find_word(filepath,keyword):
doc = open(filepath, 'r')
for line in doc:
#Remove all the unneccessary characters
line = line.replace("'", r)
line = line.replace('`', r)
line = line.replace('[', r)
line = line.replace(']', r)
line = line.replace('{', r)
line = line.replace('}', r)
line = line.replace('(', r)
line = line.replace(')', r)
line = line.replace(':', r)
line = line.replace('.', r)
line = line.replace('!', r)
line = line.replace('?', r)
line = line.replace('"', r)
line = line.replace(';', r)
line = line.replace(' ', r)
line = line.replace(',,', r)
line = line.replace(',,,', r)
line = line.replace(',,,,', r)
line = line.replace(',,,,,', r)
line = line.replace(',,,,,,', r)
line = line.replace(',,,,,,,', r)
line = line.replace('#', r)
line = line.replace('*', r)
line = line.replace('**', r)
line = line.replace('***', r)
#Make the line lowercase
line = line.lower()
#Split the line after every r (comma) and name the result "word"
words = line.split(r)
#if the keyword (also in lowercase form) appears in the before created words list
#then append the list output by the whole line in which the keyword appears
if keyword.lower() in words:
output.append(line)
return output
print find_word("pg844.txt","and")
The goal of this piece of code is to search through a text file for a certain keyword, say "and", then put the whole line in which the keyword is found into a list of type (int,string). The int should be the line number and the string the above mentioned rest whole line.
I'm still working on the line numbering - so no question concerning that yet. But the problem is: The output is empty. Even if I append a random string instead of the line, I don't get any results.
If I use
if keyword.lower() in words:
print line
I get all the desired lines, in which the keyword occurs. But I just can't get it into the output list.
The text file I'm trying to search through: http://www.gutenberg.org/cache/epub/844/pg844.txt
Upvotes: 1
Views: 103
Reputation: 965
Please use Regex. See some documentation for Regex in Python. Replacing every character/character set is confusing. The use of lists and .append()
looks correct, but perhaps look into debugging your line
variable within the for-loop, printing it occasionally to insure its value is what you want it to be.
An answer by pyInProgress makes a good point about global variables, though without testing it, I'm not convinced it's required if the output
return variable is used instead of the global output
variable. See this StackOverflow post if you need more information about global variables.
Upvotes: 2
Reputation: 955
Loop through string.punctuation
to remove everything before iterating through the lines
import string, re
r = ','
def find_word(filepath, keyword):
output = []
with open(filepath, 'rb') as f:
data = f.read()
for x in list(string.punctuation):
if x != r:
data = data.replace(x, '')
data = re.sub(r',{2,}', r, data, re.M).splitlines()
for i, line in enumerate(data):
if keyword.lower() in line.lower().split(r):
output.append((i, line))
return output
print find_word('pg844.txt', 'and')
Upvotes: 1
Reputation: 24
Since output = list()
is at the top-level of your code and isn't inside a function, it is considered a global variable.
To edit a global variable within a function, you must use the global
keyword first.
Example:
gVar = 10
def editVar():
global gVar
gVar += 5
So to edit the variable output
within your function find_word()
you must type global output
before assigning it values.
It should look like this:
r = ","
x = ""
output = list()
import string
def find_word(filepath,keyword):
doc = open(filepath, 'r')
for line in doc:
#Remove all the unneccessary characters
line = line.replace("'", r)
line = line.replace('`', r)
line = line.replace('[', r)
line = line.replace(']', r)
line = line.replace('{', r)
line = line.replace('}', r)
line = line.replace('(', r)
line = line.replace(')', r)
line = line.replace(':', r)
line = line.replace('.', r)
line = line.replace('!', r)
line = line.replace('?', r)
line = line.replace('"', r)
line = line.replace(';', r)
line = line.replace(' ', r)
line = line.replace(',,', r)
line = line.replace(',,,', r)
line = line.replace(',,,,', r)
line = line.replace(',,,,,', r)
line = line.replace(',,,,,,', r)
line = line.replace(',,,,,,,', r)
line = line.replace('#', r)
line = line.replace('*', r)
line = line.replace('**', r)
line = line.replace('***', r)
#Make the line lowercase
line = line.lower()
#Split the line after every r (comma) and name the result "word"
words = line.split(r)
#if the keyword (also in lowercase form) appears in the before created words list
#then append the list output by the whole line in which the keyword appears
global output
if keyword.lower() in words:
output.append(line)
return output
In the future, try to stay away from global variables unless you absolutely need them. They can get messy!
Upvotes: 0