mammalianps
mammalianps

Reputation: 27

Web scraping returns empty list

I am trying to scrape below info from https://www.dsmart.com.tr/yayin-akisi. However the below code returns empty list. Any idea?

<div class="col"><div class="title fS24 paBo30">NELER OLUYOR HAYATTA</div><div class="channel orangeText paBo30 fS14"><b>24 | 34. KANAL | 16 Nisan Perşembe | 6:0 - 7:0</b></div><div class="content paBo30 fS14">Billur Aktürk’ün sunduğu, yaşam değerlerini sorgulayan  program  Neler Oluyor Hayatta, toplumsal gerçekliğin bilgisine ulaşma  noktasında sınırları zorluyor. </div><div class="subTitle paBo30 fS12">Billur Aktürk’ün sunduğu, yaşam değerlerini sorgulayan  program  Neler Oluyor Hayatta, toplumsal gerçekliğin bilgisine ulaşma  noktasında sınırları zorluyor. </div></div>

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url="https://www.dsmart.com.tr/yayin-akisi"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")


for link in page_soup.find_all("div", {"class":"col"}):
    print(link)

Upvotes: 0

Views: 103

Answers (2)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15568

This website is populated by get calls to their API. You can see the get calls on your Browser (Chrome/Firefox) devtools network. If you check, you will see that they are calling API.


import requests

URL = 'https://www.dsmart.com.tr/api/v1/public/epg/schedules'

# parameters that you can tweak or add in a loop 
# e.g for page in range(1,10): to get multiple pages

params = dict(page=1, limit=10, day='2020-04-16')

r = requests.get(URL,params=params)

assert r.ok, 'issues getting data'

data = r.json()

# data is dictonary that you can grab data out using keys

print(data)

In cases like this, using BeautifulSoup is unwarranted.

Upvotes: 1

MadRay
MadRay

Reputation: 441

This page is rendered in browser. HTML you're downloading has only links to js files, which later render content of page.

You can use real browser to render page (selenium, splash or similar technologies) or understand how this page receives data you needed.

Long story short, data rendered on this page requested from this link https://www.dsmart.com.tr/api/v1/public/epg/schedules?page=1&limit=10&day=2020-04-16

It is well formatted JSON, so it's very easy to parse it. My recommendation to download page with requests module - it can return json response as dict.

Upvotes: 2

Related Questions