dovla
dovla

Reputation: 311

Can't display content in between span tag

Here is my code so far: http://pastebin.com/CdUiXpdf

import requests
from bs4 import BeautifulSoup


def web_crawler(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.kupindo.com/Knjige/artikli/1_strana_" + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        print("PAGE: " + str(page))
        for link in soup.find_all("a", class_="item_link"):
            href = link.get("href")
            # title = link.string
            print(href)
            # print(title)
            extended_crawler(href)
        page += 1


def extended_crawler(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    for view_counter in soup.find_all("span", id="BrojPregleda"):
        print("View Count: ", view_counter.text)


web_crawler(1)

The output is for example

PAGE: 1
https://www.kupindo.com/showcontent/2143/Beletristika/37875219_VUK-DRASKOVIC-Izabrana-dela-1-7-Srpska-rec
View Count:  

So the View Count is empty, even tho there is the expanded_crawler function which looks for span with id of BrojPregleda, nothing displays.

Upvotes: 0

Views: 41

Answers (1)

Zroq
Zroq

Reputation: 8382

Thats because the span which has the ID BrojPregleda is being populated via an ajax call. Either use Selenium to get the value or follow these steps:

1) Get the ID from the product in the URL

2) Post into http://www.kupindo.com/inc/ajx/Predmet/ajxGetBrojPregleda.php with a single FormData key - IDPredmet with the value of 1)

3) Get the view count

Example:

def extended_crawler(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    ViewCount = requests.post('http://www.kupindo.com/inc/ajx/Predmet/ajxGetBrojPregleda.php', data = {'IDPredmet': item_url[item_url.rfind('/') + 1:item_url.rfind('_')]})
    print (ViewCount.text)

Upvotes: 1

Related Questions