nosh
nosh

Reputation: 692

How to read in url in python and then print each URL on the website?

I am trying to figure out how to only read in each line that is a url from a website, every time I run the code I get the error:

AttributeError: module 'urllib' has no attribute 'urlopen'

My code is below

import os
import subprocess
import urllib

datasource = urllib.urlopen("www.google.com")

while 1:
        line = datasource.readline()
        if line == "": break
        if (line.find("www") > -1) :
                print (line)


li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

Upvotes: 1

Views: 426

Answers (3)

Steffi Keran Rani J
Steffi Keran Rani J

Reputation: 4093

The AttributeError was because it should be urllib.request.urlopen instead of urllib.urlopen.

Apart from the AttributeError mentioned in the question, I faced 2 more errors.

  1. ValueError: unknown url type: 'www.google.com'

    Solution: Rewrite the line defining datasource as follows where the https part is included:

    datasource = urllib.request.urlopen("https://www.google.com")

  2. TypeError: a bytes-like object is required, not 'str' in the line ' if (line.find("www") > -1) :`.

The overall solution code is:

import os
import urllib

datasource = urllib.request.urlopen("https://www.google.com")

while 1:
        line = str(datasource.read())
        if line == "": break
        if (line.find("www") > -1) :
                print (line)

li = ['www.apple.com', 'www.google.com']
os.chdir('..')
os.chdir('..')
os.chdir('..')
os.chdir('Program Files (x86)\\LinkChecker')

for s in li:
    os.system('Start .\linkchecker ' + s)

Upvotes: 0

itzMEonTV
itzMEonTV

Reputation: 20349

Seems python3X, so you should use

urllib.request.urlopen

Upvotes: 0

danglingpointer
danglingpointer

Reputation: 4920

This is very simple example.

This works in Python 3.2 and greater.

import urllib.request
with urllib.request.urlopen("http://www.apple.com") as url:
    r = url.read()
print(r)

For reference, go through this question. Urlopen attribute error.

Upvotes: 1

Related Questions