user3194257
user3194257

Reputation: 39

CSS Selector in soup.select() returns null

I'm learning web scraping and want to fetch data from webpage that matches the css i am passing in soup.select("css locators"). When i inspect the css locators in webpage, it highlights the correct elements but when i use the same in the soup.select() method it returns none or 0.

  1. I'm retrieving a data from a website: https://www.prokabaddi.com/teams/bengaluru-bulls-profile-1

  2. Css selector used to fetch data from above website: .si-section-header > span.si-title

  3. with the above css when i inspect the webpage in browser its works fine but when i use the same in the soup.select(".si-section-header > span.si-title") method it returns none or 0

# code sample 
import requests 
import bs4 

URL = "https://www.prokabaddi.com/teams/bengaluru-bulls-profile-1"
r = requests.get(URL) 

soup = BeautifulSoup(r.content, 'html.parser') 
a = soup.select('.si-section-header > span.si-title')
Print(a)

I expect the output to return the values the CSS selectors highlights. In this case the CSS highlights 3 values so i expect it to print all the three values when i execute the above code.

Upvotes: 2

Views: 564

Answers (2)

Dan-Dev
Dan-Dev

Reputation: 9430

Depending on the data you want, much of this data is returned in JSON.

import requests

j = requests.get("https://www.prokabaddi.com/sifeeds/kabaddi/live/json/multisport_cache_25_3_pkl_0530_en_team_1.json").json()

for match in j['matches']:
    print(match)

Other URLs that may have the data you want include:

https://www.prokabaddi.com/sifeeds/kabaddi/static/json/1_team.json

https://www.prokabaddi.com/sifeeds/kabaddi/live/json/multisport_cache_25_3_0_0530_en_4.json

You can see them all by opening developer tools selecting the networks tab then XHR and refreshing the page

Upvotes: 0

QHarr
QHarr

Reputation: 84465

Large amounts of content are dynamically added and not caught in your initial request. The elements you are looking at are part of a template pulled from another resource. You can find it in the network tab when refreshing the page.

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.prokabaddi.com/static-assets/kabaddi/views/kwl-team-stats-partial.html?v=1.064')
soup = bs(r.content, 'lxml')
print([i.text for i in soup.select('.si-title')])

Upvotes: 2

Related Questions