Baili
Baili

Reputation: 159

Scraping content with python and selenium

I would like to extract all the league names (e.g. England Premier League, Scotland Premiership, etc.) from this website https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1

Taking the inspector tools from Chrome/Firefox I can see that they are located here:

<span>England Premier League</span>

So I tried this

from lxml import html

from selenium import webdriver

session = webdriver.Firefox()
url = 'https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1'
session.get(url)
tree = html.fromstring(session.page_source)
leagues = tree.xpath('//span/text()')
print(leagues)

Unfortunately this doesn't return the desired results :-(

To me it looks like the website has different frames and I'm extracting the content from the wrong frame.

Could anyone please help me out here or point me in the right direction? As an alternative if someone knows how to extract the information through their api then this would obviously be the superior solution.

Any help is much appreciated. Thank you!

Upvotes: 0

Views: 3653

Answers (2)

Andersson
Andersson

Reputation: 52695

Required content is absent in initial page source. It comes dynamically from https://mobile.bet365.com/V6/sport/splash/splash.aspx?zone=0&isocode=RO&tzi=4&key=1&gn=0&cid=1&lng=1&ctg=1&ct=156&clt=8881&ot=2

To be able to get this content you can use ExplicitWait as below:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

session = webdriver.Firefox()
url = 'https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1'
session.get(url)
WebDriverWait(session, 10).until(EC.presence_of_element_located((By.ID, 'Splash')))

for collapsed in session.find_elements_by_xpath('//h3[contains(@class, "collapsed")]'):
    collapsed.location_once_scrolled_into_view
    collapsed.click()

for event in session.find_elements_by_xpath('//div[contains(@class, "eventWrapper")]//span'):
    print(event.text)

Upvotes: 1

thebadguy
thebadguy

Reputation: 2140

Hope you are looking for something like this:

from selenium import webdriver
import  bs4, time

driver = webdriver.Chrome()
url = 'https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1'


driver.get(url)
driver.maximize_window()
# sleep is given so that JS populate data in this time
time.sleep(10)
pSource= driver.page_source

soup = bs4.BeautifulSoup(pSource, "html.parser")


for data in soup.findAll('div',{'class':'eventWrapper'}):
    for res in data.find_all('span'):
        print res.text

It will print the below data:

Wednesday's Matches
International List
Elite Euro List
UK List
Australia List
Club Friendly List
England Premier League
England EFL Cup
England Championship
England League 1
England League 2
England National League
England National League North
England National League South
Scotland Premiership
Scotland League Cup
Scotland Championship
Scotland League One
Scotland League Two
Northern Ireland Reserve League
Scotland Development League East
Wales Premier League
Wales Cymru Alliance
Asia - World Cup Qualifying
UEFA Champions League
UEFA Europa League
Wednesday's Matches
International List
Elite Euro List
UK List
Australia List
Club Friendly List
England Premier League
England EFL Cup
England Championship
England League 1
England League 2
England National League
England National League North
England National League South
Scotland Premiership
Scotland League Cup
Scotland Championship
Scotland League One
Scotland League Two
Northern Ireland Reserve League
Scotland Development League East
Wales Premier League
Wales Cymru Alliance
Asia - World Cup Qualifying
UEFA Champions League
UEFA Europa League

Only problem is its printing result set twice

Upvotes: 2

Related Questions