Reputation: 536
I am trying to create a scraper that enters a website and downloads an XML file containing a balance sheet from real estate funds.
With the code below, I enter a website for a specific fund with document number 07000400000146 (at the end of URL), filter the documents with a search bar on the website, and click to download the first document in the table using XPath.
driver.get('https://fnet.bmfbovespa.com.br/fnet/publico/abrirGerenciadorDocumentosCVM?cnpjFundo=07000400000146')
driver.find_element_by_css_selector(f'input[type="search"]').click()
driver.find_element_by_css_selector(f'input[type="search"]').send_keys('informe mensal')
time.sleep(1)
driver.find_element_by_xpath('//*[@id="tblDocumentosEnviados"]/tbody/tr[1]/td[10]/div/a[2]/i').click()
How can I create a boolean expression to discover if a row is of 03/2020 or 02/2020 or 01/2020, in column "Data de Referência" and then download all the available files for each corresponding dates?
Upvotes: 1
Views: 472
Reputation: 33384
You can try a list like datelist=[ '03/2020','02/2020','01/2020']
this and check the dates are available on the webpage using below xpath option.If the date available this will click and download else tell no dates available.
Use try..except
block.ignore sleep()
use WebDriverWait
()
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver=webdriver.Chrome()
driver.get('https://fnet.bmfbovespa.com.br/fnet/publico/abrirGerenciadorDocumentosCVM?cnpjFundo=07000400000146')
driver.find_element_by_css_selector('input[type="search"]').click()
driver.find_element_by_css_selector('input[type="search"]').send_keys('informe mensal')
datelist=[ '03/2020','02/2020','01/2020']
for dates in datelist:
try:
WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.XPATH,"//table[@id='tblDocumentosEnviados']//td[text()='"+ dates + "']/following-sibling::td[5]//a[@title='Download do Documento']"))).click()
print('file downloaded with dates available ' + dates)
except:
print("No such dates available " + dates)
This will print on console like this.
No such dates available 03/2020
file downloaded with dates available 02/2020
file downloaded with dates available 01/2020
Updated code with status Ativo
driver.get('https://fnet.bmfbovespa.com.br/fnet/publico/abrirGerenciadorDocumentosCVM?cnpjFundo=07000400000146')
driver.find_element_by_css_selector('input[type="search"]').click()
driver.find_element_by_css_selector('input[type="search"]').send_keys('informe mensal')
datelist=[ '03/2020','02/2020','01/2020']
for dates in datelist:
try:
WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.XPATH,"//table[@id='tblDocumentosEnviados']//td[text()='"+ dates +"']/following-sibling::td[.//span[text()='Ativo']]/following-sibling::td//a[@title='Download do Documento']"))).click()
print('file downloaded with dates available ' + dates)
except:
print("No such dates available " + dates)
Upvotes: 1
Reputation: 4177
Please try below code to download your pdf
driver.get('https://fnet.bmfbovespa.com.br/fnet/publico/abrirGerenciadorDocumentosCVM?cnpjFundo=07000400000146')
driver.maximize_window()
wait = WebDriverWait(driver, 20)
elemnt=wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'label > input')))
elemnt.send_keys('informe mensal')
mylist=[ '03/2020','02/2020','01/2020']
for list in mylist:
button = WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH,
"//td[text()='" + list + "']//following-sibling::td//a[@title='Download do Documento']")))
driver.execute_script("arguments[0].click();", button);
Upvotes: 0