user1596502
user1596502

Reputation: 21

Verify URL exists from file

So I have some code that I use to scrape through my mailbox looking for certain URL's. Once this is completed it creates a file called links.txt

I want to run a script against that file to get an output of all the current URL's that are live in that list. The script I have only allows for me to check on URL at a time

import urllib2

for url in ["www.google.com"]:

    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

Upvotes: 0

Views: 541

Answers (2)

kindall
kindall

Reputation: 184345

It is trivial to make this change, given that you're already iterating over a list of URLs:

import urllib2

for url in open("urllist.txt"):   # change 1

    try:
        connection = urllib2.urlopen(url.rstrip())   # change 2
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

Iterating over a file returns the lines of the file (complete with line endings). We use rstrip() on the URL to strip off the line endings.

There are other improvements you can make. For example, some will suggest you use with to make sure your file is closed. This is good practice but probably not necessary in this script.

Upvotes: 1

Brenden Brown
Brenden Brown

Reputation: 3235

Use requests:

import requests

with open(filename) as f:
    good_links = []
    for link in file:
        try:
            r = requests.get(link.strip())
        except Exception:
            continue
        good_links.append(r.url) #resolves redirects

You can also consider extracting the call to requests.get into a helper function:

def make_request(method, url, **kwargs):
    for i in range(10):
        try:
            r = requests.request(method, url, **kwargs)
            return r
        except requests.ConnectionError as e:
            print e.message
        except requests.HTTPError as e:
            print e.message
        except requests.RequestException as e:
            print e.message
    raise Exception("requests did not succeed")

Upvotes: 4

Related Questions