Reputation: 129
I'm using Python 2 and I'm reading a file with 400 ish domain names(separated by new lines) into my script, testing if they match the expression '*.in' and if they do, saving them to a list and writing the list to a new .txt. However, the script only picks up the last .in domain in the file and there are several more. Any ideas?
#!/usr/bin/python
from fnmatch import fnmatch
newDomains = []
with open ('fishDomains.txt', 'r+') as f:
for line in f:
print line
if fnmatch(line, '*.in') is True:
print line
newDomains.append(line)
with open('newFishDomains.txt', 'r+') as c:
for item in newDomains:
#print item
c.write(item)
c.write("\n")
Upvotes: 1
Views: 424
Reputation: 187
After my test, I think that should be the End of Line
symbol cause the problem. In my environment (win7), I open a test file showed as following (to be more specific, I also show the EoL symbol)
testline1.in\r\n
ttline2.in\r\n
line3.in
applying your code to this file, it only shows ['line3.in']. Therefore I suggest you to use strip()
, which cut the end of line symbol (both LF or CRLF) and remove the leading trailing space.
My modified code is as below:
with open ('fishDomains.txt', 'r+') as f:
for line in f:
line = line.strip() # <====================
print line
if fnmatch(line, '*.in') is True:
print line
newDomains.append(line)
one thing should be noted that the old macOS system used CR as EoL, which kind of EoL would fail with strip
, but that's more than 10 years ago, should have no problem now.
Upvotes: 1
Reputation: 189749
for line in f
where f
is an opened file returns entire lines, including the terminating newline.
You want to strip the line, and probably not use fnmatch
for something which built-in string functions can do.
with open ('fishDomains.txt', 'r+') as f:
for line in f:
line = line.rstrip('\r\n')
if line.endswith('.in'):
print line
newDomains.append(line)
As an aside, you should usually take care that all the lines in your text files have a proper line ending character.
As another aside, the list variable is slightly clumsy, and not very scalable. For large files in particular, it makes sense to write out what you found as soon as possible, instead of collect all the data in memory.
with open('newFishDomains.txt', 'r+') as c:
with open ('fishDomains.txt', 'r+') as f:
for line in f:
line = line.rstrip('\r\n')
if line.endswith('.in'):
print line
c.write(line + '\n')
Finally, in shell, this is obviously a one-liner:
grep '\.in$' fishDomains.txt >newFishDomains.txt
Upvotes: 1