fana it
fana it

Reputation: 63

Python 2.7 : unknown url type: urllib2 - BeautifulSoup

import libraries

import urllib2
from bs4 import BeautifulSoup

new libraries:

import csv
import requests 
import string

Defining variables:

i = 1
str_i = str(i)
seqPrefix = 'seq_'
seq_1 = str('https://anyaddress.com/')
quote_page = seqPrefix + str_i

#Then, make use of the Python urllib2 to get the HTML page of the url declared.

# query the website and return the html to the variable 'page'
page = urllib2.urlopen(quote_page)  


#Finally, parse the page into BeautifulSoup format so we can use BeautifulSoup to work on it.

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')

As a result, all is fine...except that:

ERROR MESSAGE:

page = urllib2.urlopen(quote_page) File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 423, in open protocol = req.get_type() File "C:\Python27\lib\urllib2.py", line 285, in get_type raise ValueError, "unknown url type: %s" % self.__original ValueError: unknown url type: seq_1

Why?

txs.

Upvotes: 0

Views: 458

Answers (2)

Rakesh
Rakesh

Reputation: 82785

Looks like you need to concat seq_1 & str_i

Ex:

seq_1 = str('https://anyaddress.com/')
quote_page = seq_1 + str_i

Output:

https://anyaddress.com/1

Upvotes: 1

Dan-Dev
Dan-Dev

Reputation: 9440

You can use the local variable dictionary vars()

page = urllib2.urlopen(vars()[quote_page])

The way you had it it was trying to open the URL using the string "seq_1" as the URL not the value of the seq_1 variable which is a valid URL.

Upvotes: 2

Related Questions