neacal
neacal

Reputation: 45

Empty output after appending a list

r = ","
x = ""
output = list()
import string

def find_word(filepath,keyword):
    doc = open(filepath, 'r')

    for line in doc:
        #Remove all the unneccessary characters
        line = line.replace("'", r)
        line = line.replace('`', r)
        line = line.replace('[', r)
        line = line.replace(']', r)
        line = line.replace('{', r)
        line = line.replace('}', r)
        line = line.replace('(', r)
        line = line.replace(')', r)
        line = line.replace(':', r)
        line = line.replace('.', r)
        line = line.replace('!', r)
        line = line.replace('?', r)
        line = line.replace('"', r)
        line = line.replace(';', r)
        line = line.replace(' ', r)
        line = line.replace(',,', r)
        line = line.replace(',,,', r)
        line = line.replace(',,,,', r)
        line = line.replace(',,,,,', r)
        line = line.replace(',,,,,,', r)
        line = line.replace(',,,,,,,', r)
        line = line.replace('#', r)
        line = line.replace('*', r)
        line = line.replace('**', r)
        line = line.replace('***', r)

        #Make the line lowercase
        line = line.lower()

        #Split the line after every r (comma) and name the result "word"
        words = line.split(r)

        #if the keyword (also in lowercase form) appears in the before created words list
        #then append the list output by the whole line in which the keyword appears

        if keyword.lower() in words:
            output.append(line)

    return output

print find_word("pg844.txt","and")

The goal of this piece of code is to search through a text file for a certain keyword, say "and", then put the whole line in which the keyword is found into a list of type (int,string). The int should be the line number and the string the above mentioned rest whole line.

I'm still working on the line numbering - so no question concerning that yet. But the problem is: The output is empty. Even if I append a random string instead of the line, I don't get any results.

If I use

if keyword.lower() in words:
        print line

I get all the desired lines, in which the keyword occurs. But I just can't get it into the output list.

The text file I'm trying to search through: http://www.gutenberg.org/cache/epub/844/pg844.txt

Upvotes: 1

Views: 103

Answers (3)

Kody
Kody

Reputation: 965

Please use Regex. See some documentation for Regex in Python. Replacing every character/character set is confusing. The use of lists and .append() looks correct, but perhaps look into debugging your line variable within the for-loop, printing it occasionally to insure its value is what you want it to be.

An answer by pyInProgress makes a good point about global variables, though without testing it, I'm not convinced it's required if the output return variable is used instead of the global output variable. See this StackOverflow post if you need more information about global variables.

Upvotes: 2

Cody Bouche
Cody Bouche

Reputation: 955

Loop through string.punctuation to remove everything before iterating through the lines

import string, re

r = ','

def find_word(filepath, keyword):

    output = []
    with open(filepath, 'rb') as f:
        data = f.read()
        for x in list(string.punctuation):
            if x != r:
                data = data.replace(x, '')
        data = re.sub(r',{2,}', r, data, re.M).splitlines()

    for i, line in enumerate(data):
        if keyword.lower() in line.lower().split(r):
            output.append((i, line))
    return output

print find_word('pg844.txt', 'and')

Upvotes: 1

pyInProgress
pyInProgress

Reputation: 24

Since output = list() is at the top-level of your code and isn't inside a function, it is considered a global variable. To edit a global variable within a function, you must use the global keyword first.

Example:

gVar = 10

def editVar():
    global gVar
    gVar += 5

So to edit the variable output within your function find_word() you must type global output before assigning it values.

It should look like this:

r = ","
x = ""
output = list()
import string

def find_word(filepath,keyword):
    doc = open(filepath, 'r')

    for line in doc:
        #Remove all the unneccessary characters
        line = line.replace("'", r)
        line = line.replace('`', r)
        line = line.replace('[', r)
        line = line.replace(']', r)
        line = line.replace('{', r)
        line = line.replace('}', r)
        line = line.replace('(', r)
        line = line.replace(')', r)
        line = line.replace(':', r)
        line = line.replace('.', r)
        line = line.replace('!', r)
        line = line.replace('?', r)
        line = line.replace('"', r)
        line = line.replace(';', r)
        line = line.replace(' ', r)
        line = line.replace(',,', r)
        line = line.replace(',,,', r)
        line = line.replace(',,,,', r)
        line = line.replace(',,,,,', r)
        line = line.replace(',,,,,,', r)
        line = line.replace(',,,,,,,', r)
        line = line.replace('#', r)
        line = line.replace('*', r)
        line = line.replace('**', r)
        line = line.replace('***', r)

        #Make the line lowercase
        line = line.lower()

        #Split the line after every r (comma) and name the result "word"
        words = line.split(r)

        #if the keyword (also in lowercase form) appears in the before created words list
        #then append the list output by the whole line in which the keyword appears

        global output
        if keyword.lower() in words:
            output.append(line)

    return output

In the future, try to stay away from global variables unless you absolutely need them. They can get messy!

Upvotes: 0

Related Questions