Reputation: 39
I'm learning web scraping and want to fetch data from webpage that matches the css i am passing in soup.select("css locators")
. When i inspect the css locators in webpage, it highlights the correct elements but when i use the same in the soup.select()
method it returns none or 0.
I'm retrieving a data from a website: https://www.prokabaddi.com/teams/bengaluru-bulls-profile-1
Css selector used to fetch data from above website: .si-section-header > span.si-title
with the above css when i inspect the webpage in browser its works fine but when i use the same in the soup.select(".si-section-header > span.si-title")
method it returns none or 0
# code sample
import requests
import bs4
URL = "https://www.prokabaddi.com/teams/bengaluru-bulls-profile-1"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
a = soup.select('.si-section-header > span.si-title')
Print(a)
I expect the output to return the values the CSS selectors highlights. In this case the CSS highlights 3 values so i expect it to print all the three values when i execute the above code.
Upvotes: 2
Views: 564
Reputation: 9430
Depending on the data you want, much of this data is returned in JSON.
import requests
j = requests.get("https://www.prokabaddi.com/sifeeds/kabaddi/live/json/multisport_cache_25_3_pkl_0530_en_team_1.json").json()
for match in j['matches']:
print(match)
Other URLs that may have the data you want include:
https://www.prokabaddi.com/sifeeds/kabaddi/static/json/1_team.json
https://www.prokabaddi.com/sifeeds/kabaddi/live/json/multisport_cache_25_3_0_0530_en_4.json
You can see them all by opening developer tools selecting the networks tab then XHR and refreshing the page
Upvotes: 0
Reputation: 84465
Large amounts of content are dynamically added and not caught in your initial request. The elements you are looking at are part of a template pulled from another resource. You can find it in the network tab when refreshing the page.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.prokabaddi.com/static-assets/kabaddi/views/kwl-team-stats-partial.html?v=1.064')
soup = bs(r.content, 'lxml')
print([i.text for i in soup.select('.si-title')])
Upvotes: 2