user1985563
user1985563

Reputation: 197

Delete lines from a txt file after read

i'm trying to create a script which makes requests to random urls from a txt file

import urllib2

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

But I want that when some url does 404 not found erase from the file. Each url is per line, so basically is to erase every url that does 404 not found. How to do it?!

Upvotes: 1

Views: 605

Answers (2)

zwol
zwol

Reputation: 140659

In order to delete lines from a file, you have to rewrite the entire contents of the file. The safest way to do that is to write out a new file in the same directory and then rename it over the old file. I'd modify your code like this:

import os
import sys
import tempfile
import urllib2

good_urls = set()

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            sys.stdout.write('[{}]: Up!\n'.format(url))
            good_urls.add(url)
        elif r.code == 404:
            sys.stdout.write('[{}]: Not found!\n'.format(url))
        else:
            sys.stdout.write('[{}]: Unexpected response code {}\n'.format(url, r.code))

tmp = None
try:
    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', dir='.', delete=False)
    for url in sorted(good_urls):
        tmp.write(url + "\n")
    tmp.close()
    os.rename(tmp.name, 'urls.txt')
    tmp = None
finally:
    if tmp is not None:
        os.unlink(tmp.name)

You may want to add a good_urls.add(url) to the else clause in the first loop. If anyone knows a tidier way to do what I did with try-finally there at the end, I'd like to hear about it.

Upvotes: 0

Blender
Blender

Reputation: 298206

You could write to a second file:

import urllib2

with open('urls.txt', 'r') as urls, open('urls2.txt', 'w') as urls2:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e

        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
            urls2.write(url + '\n')
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

Upvotes: 1

Related Questions