Reputation: 197
i'm trying to create a script which makes requests to random urls from a txt file
import urllib2
with open('urls.txt') as urls:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
print '[{}]: '.format(url), "Up!"
elif r.code == 404:
print '[{}]: '.format(url), "Not Found!"
But I want that when some url does 404 not found erase from the file. Each url is per line, so basically is to erase every url that does 404 not found. How to do it?!
Upvotes: 1
Views: 605
Reputation: 140659
In order to delete lines from a file, you have to rewrite the entire contents of the file. The safest way to do that is to write out a new file in the same directory and then rename
it over the old file. I'd modify your code like this:
import os
import sys
import tempfile
import urllib2
good_urls = set()
with open('urls.txt') as urls:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
sys.stdout.write('[{}]: Up!\n'.format(url))
good_urls.add(url)
elif r.code == 404:
sys.stdout.write('[{}]: Not found!\n'.format(url))
else:
sys.stdout.write('[{}]: Unexpected response code {}\n'.format(url, r.code))
tmp = None
try:
tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', dir='.', delete=False)
for url in sorted(good_urls):
tmp.write(url + "\n")
tmp.close()
os.rename(tmp.name, 'urls.txt')
tmp = None
finally:
if tmp is not None:
os.unlink(tmp.name)
You may want to add a good_urls.add(url)
to the else
clause in the first loop. If anyone knows a tidier way to do what I did with try-finally there at the end, I'd like to hear about it.
Upvotes: 0
Reputation: 298206
You could write to a second file:
import urllib2
with open('urls.txt', 'r') as urls, open('urls2.txt', 'w') as urls2:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
print '[{}]: '.format(url), "Up!"
urls2.write(url + '\n')
elif r.code == 404:
print '[{}]: '.format(url), "Not Found!"
Upvotes: 1