Reputation: 629
I am trying to scrape with python 2.7 from this website:
http://www.motogp.com/en/Results+Statistics/
I want to scrape the main one, that has many categories (Event), the one that appears next to the MotoGP Race Classification 2017 blue letters
And after that scrape for years as well. So far I have:
import re
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = "http://www.motogp.com/en/Results+Statistics/"
r = urlopen(url).read()
soup = BeautifulSoup(r)
type(soup)
match = re.search(b'\"(.*?\.pdf)\"', r)
pdf_url="http://resources.motogp.com/files/results/2017/ARG/MotoGP/RAC/Classification" + match.group(1).decode('utf8')
The links are this type:
http://resources.motogp.com/files/results/2017/AME/MotoGP/RAC/Classification.pdf?v1_ef0b514c
So I should add the thing "?" after the character. The main problem is how to switch from event to event to get all the links in this type of format.
Upvotes: 0
Views: 1908
Reputation: 22440
According to the description you have provided above, this is how can get those pdf
links:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("http://www.motogp.com/en/Results+Statistics/")
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))):
item.click()
elem = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "padleft5")))
print(elem.get_attribute("href"))
wait.until(EC.staleness_of(elem))
driver.quit()
Partial output:
http://resources.motogp.com/files/results/2017/VAL/MotoGP/RAC/worldstanding.pdf?v1_8dbea75c
http://resources.motogp.com/files/results/2017/QAT/MotoGP/RAC/Classification.pdf?v1_f6564614
http://resources.motogp.com/files/results/2017/ARG/MotoGP/RAC/Classification.pdf?v1_9107e18d
http://resources.motogp.com/files/results/2017/AME/MotoGP/RAC/Classification.pdf?v1_ef0b514c
http://resources.motogp.com/files/results/2017/SPA/MotoGP/RAC/Classification.pdf?v1_ba33b120
Upvotes: 2