i can not get the body element of html page in web scraping by python

Question

I would like to parse a website with urllib python library. I wrote this:

from bs4 import BeautifulSoup
from urllib.request import HTTPCookieProcessor, build_opener
from http.cookiejar import FileCookieJar


def makeSoup(url):
    jar = FileCookieJar("cookies")
    opener = build_opener(HTTPCookieProcessor(jar))
    html = opener.open(url).read()
    return BeautifulSoup(html, "lxml")


def articlePage(url):
    return makeSoup(url)


Links = "http://collegeprozheh.ir/%d9%85%d9%82%d8%a7%d9%84%d9%87-   %d9%85%d8%af%d9%84-%d8%b1%d9%82%d8%a7%d8%a8%d8%aa%db%8c-%d8%af%d8%b1-%d8%b5%d9%86%d8%b9%d8%aa-%d9%be%d9%86%d9%84-%d9%87%d8%a7%db%8c-%d8%ae%d9%88%d8%b1%d8%b4%db%8c%d8%af/"
print(articlePage(Links))

but the website does not return content of body tag. this is result of my program:

cURL = window.location.href;
var p = new Date();
second = p.getTime();
GetVars = getUrlVars();

setCookie("Human" , "15421469358743" , 10);
check_coockie = getCookie("Human");

if (check_coockie != "15421469358743")
        document.write("Could not Set cookie!");
else
        window.location.reload(true);

i think the cookie has caused this problem.

i can not get the body element of html page in web scraping by python

Answers (1)

Related Questions