Reputation: 27
I am trying to scrape below info from https://www.dsmart.com.tr/yayin-akisi. However the below code returns empty list. Any idea?
<div class="col"><div class="title fS24 paBo30">NELER OLUYOR HAYATTA</div><div class="channel orangeText paBo30 fS14"><b>24 | 34. KANAL | 16 Nisan Perşembe | 6:0 - 7:0</b></div><div class="content paBo30 fS14">Billur Aktürk’ün sunduğu, yaşam değerlerini sorgulayan program Neler Oluyor Hayatta, toplumsal gerçekliğin bilgisine ulaşma noktasında sınırları zorluyor. </div><div class="subTitle paBo30 fS12">Billur Aktürk’ün sunduğu, yaşam değerlerini sorgulayan program Neler Oluyor Hayatta, toplumsal gerçekliğin bilgisine ulaşma noktasında sınırları zorluyor. </div></div>
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url="https://www.dsmart.com.tr/yayin-akisi"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")
for link in page_soup.find_all("div", {"class":"col"}):
print(link)
Upvotes: 0
Views: 103
Reputation: 15568
This website is populated by get calls to their API. You can see the get calls on your Browser (Chrome/Firefox) devtools network. If you check, you will see that they are calling API.
import requests
URL = 'https://www.dsmart.com.tr/api/v1/public/epg/schedules'
# parameters that you can tweak or add in a loop
# e.g for page in range(1,10): to get multiple pages
params = dict(page=1, limit=10, day='2020-04-16')
r = requests.get(URL,params=params)
assert r.ok, 'issues getting data'
data = r.json()
# data is dictonary that you can grab data out using keys
print(data)
In cases like this, using BeautifulSoup is unwarranted.
Upvotes: 1
Reputation: 441
This page is rendered in browser. HTML you're downloading has only links to js files, which later render content of page.
You can use real browser to render page (selenium, splash or similar technologies) or understand how this page receives data you needed.
Long story short, data rendered on this page requested from this link https://www.dsmart.com.tr/api/v1/public/epg/schedules?page=1&limit=10&day=2020-04-16
It is well formatted JSON, so it's very easy to parse it. My recommendation to download page with requests module - it can return json response as dict.
Upvotes: 2