Carlos Espinosa
Carlos Espinosa

Reputation: 5

specific problems to read page with "BeautifulSoup"

I do not want to advertise any product.

But the error is very specific and I do not know how ask otherwise.

I want to get the links in the menu on the page A, which is in the code, but that page has another page associated, B

when I read the menu, it take the menu from page B, I do not understand why.

In the html, I see that all functions and libraries are in domain's page B.

Any suggestions?

from bs4 import BeautifulSoup
import http.cookiejar, urllib.request

mainurl="http://uk.example.com"



cookijar = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookijar))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
mainPage = opener.open(mainurl)
mainPageRequest = mainPage.read()
mainPagesoup = BeautifulSoup(mainPageRequest)

menu=mainPagesoup.find("div", { "class" : "mainNavigation_linkList_content" })
print(menu)

I want http://uk.example.com and the program read http://uk.example.co.uk/ menu

Upvotes: 0

Views: 138

Answers (1)

Joe
Joe

Reputation: 3059

urllib doesn't seem to handle the redirects the way the server is expecting.

First install requests:

pip install requests

Then try this:

import requests
from bs4 import BeautifulSoup

s = requests.Session()
mainPage = s.get("http://uk.accessorize.com")

mainPagesoup = BeautifulSoup(mainPage.text)
menu=mainPagesoup.find("div", { "class" : "mainNavigation_linkList_content" })
print(menu)

Upvotes: 1

Related Questions