Asa Hunt
Asa Hunt

Reputation: 129

Why is my script only working for the last line of my text file?

I'm using Python 2 and I'm reading a file with 400 ish domain names(separated by new lines) into my script, testing if they match the expression '*.in' and if they do, saving them to a list and writing the list to a new .txt. However, the script only picks up the last .in domain in the file and there are several more. Any ideas?

    #!/usr/bin/python

from fnmatch import fnmatch

newDomains = []


with open ('fishDomains.txt', 'r+') as f:
    for line in f:
        print line
        if fnmatch(line, '*.in') is True:
            print line
            newDomains.append(line)

with open('newFishDomains.txt', 'r+') as c:
    for item in newDomains:
        #print item
        c.write(item)
        c.write("\n")

Upvotes: 1

Views: 424

Answers (2)

Jkm
Jkm

Reputation: 187

After my test, I think that should be the End of Line symbol cause the problem. In my environment (win7), I open a test file showed as following (to be more specific, I also show the EoL symbol)

testline1.in\r\n
ttline2.in\r\n
line3.in

applying your code to this file, it only shows ['line3.in']. Therefore I suggest you to use strip(), which cut the end of line symbol (both LF or CRLF) and remove the leading trailing space.

My modified code is as below:

with open ('fishDomains.txt', 'r+') as f:
    for line in f:
        line = line.strip()   # <====================
        print line
        if fnmatch(line, '*.in') is True:
            print line
            newDomains.append(line)

one thing should be noted that the old macOS system used CR as EoL, which kind of EoL would fail with strip, but that's more than 10 years ago, should have no problem now.

Upvotes: 1

tripleee
tripleee

Reputation: 189749

for line in f where f is an opened file returns entire lines, including the terminating newline.

You want to strip the line, and probably not use fnmatch for something which built-in string functions can do.

with open ('fishDomains.txt', 'r+') as f:
    for line in f:
        line = line.rstrip('\r\n')
        if line.endswith('.in'):
            print line
            newDomains.append(line)

As an aside, you should usually take care that all the lines in your text files have a proper line ending character.

As another aside, the list variable is slightly clumsy, and not very scalable. For large files in particular, it makes sense to write out what you found as soon as possible, instead of collect all the data in memory.

with open('newFishDomains.txt', 'r+') as c:
    with open ('fishDomains.txt', 'r+') as f:
        for line in f:
            line = line.rstrip('\r\n')
            if line.endswith('.in'):
                print line
                c.write(line + '\n')

Finally, in shell, this is obviously a one-liner:

grep '\.in$' fishDomains.txt >newFishDomains.txt

Upvotes: 1

Related Questions