skillless
skillless

Reputation: 29

beautifulsoup doesn't fully parse the page

    import requests
from bs4 import BeautifulSoup as bs

url1 = 'https://school.karelia.ru/auth/login'
url2 = 'https://school.karelia.ru/personal-area/#diary'

payload = {
    'login_login': 'КлочковМ',
    'login_password': 'КлочковМ7'
}

def getHW():
    with requests.session() as s:
        s.post(url1, data=payload)
        r = s.get(url2)
        soup = bs(r.content, 'html.parser')
        print(soup.find_all("div"))

getHW()

i am trying to parse a site, and this code just doesnt do it fully. in the website's code, there are a lot more subclasses than the result i get from this code:

<div class="right" id="main-region"></div>

for some reason, the class "right" just ends there, even though in the site it continues a lot more. why could this be?

Upvotes: 0

Views: 30

Answers (1)

Banana
Banana

Reputation: 2533

it is because you did soup.find_all("div"). the div ends there with </div> and you told BS to only look for divs, so BS stops there. to actually search for classes see for example this answer

Upvotes: 1

Related Questions