Reputation: 546
I am trying to scrape a website for news on Real Estate funds. My code gets Process finished with exit code 0 but none of the content is printed.
driver.get('https://fiis.com.br/atualizacoes/')
time.sleep(2)
driver.find_element_by_xpath('//*[@id="reports-list"]/div/div[2]/div/div[1]/div/div[1]/a[1]/div').click()
driver.find_element_by_link_text(d_N1_2).click()
d_N1_2_click = driver.find_element_by_link_text(d_N1_2)
elements = d_N1_2_click.find_elements_by_tag_name('li')
time.sleep(1)
for elem in elements:
print(elem.get_attribute("primary-title"))
print(elem.get_attribute("href"))
print(elem.get_attribute("secondary-title"))
Here is html:
<li>
<a href="http://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=84320&amp;flnk" target="_blank">
<span class="primary-title">FISD11</span>
<span class="secondary-title">Informe Mensal - 02/2020</span>
<input type="hidden" class="report-content">
</a>
</li>
I am doing this to add it later to a dataframe, with an order from primary-text
secondary-text
href
d_N1_2
today=datetime.date.today()
five_day=datetime.timedelta(days=-1)
d_N1=today+five_day
d_N1_2 = d_N1.strftime('%d.%m.%y')
Upvotes: 1
Views: 1407
Reputation: 12255
How it should be:
import datetime
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver, 10)
driver.get('https://fiis.com.br/atualizacoes/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-type="date"]'))).click()
today = datetime.date.today()
five_day = datetime.timedelta(days=-1)
d_N1 = today + five_day
d_N1_2 = d_N1.strftime('%Y-%m-%d')
# //a[normalize-space(.)='06.03.20']
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'li[data-item="{d_N1_2}"]'))).click()
elements = wait.until(EC.visibility_of_any_elements_located((By.CSS_SELECTOR, f'li[data-item="{d_N1_2}"] li')))
for elem in elements:
driver.execute_script('arguments[0].scrollIntoView()', elem)
wait.until(EC.visibility_of(elem))
print(elem.find_element_by_css_selector('span.primary-title').text,
elem.find_element_by_css_selector('span.secondary-title').text,
elem.find_element_by_css_selector('a').get_attribute("href"))
Upvotes: 1
Reputation:
Untested but you can try something along the lines of the following:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import datetime, time
today=datetime.date.today()
five_day=datetime.timedelta(days=-1)
d_N1=today+five_day
d_N1_2 = d_N1.strftime('%d.%m.%y')
print(d_N1_2)
driver = webdriver.Firefox()
driver.get('https://fiis.com.br/atualizacoes/')
time.sleep(2)
driver.find_element_by_xpath('//*[@id="reports-list"]/div/div[2]/div/div[1]/div/div[1]/a[1]/div').click()
time.sleep(4)
driver.find_element_by_link_text(d_N1_2).click()
time.sleep(4)
d_N1_2_click = driver.find_element_by_link_text(d_N1_2)
elements = d_N1_2_click.find_elements_by_tag_name('li')
print(elements) # if empty, you'll need to fix something above
time.sleep(1)
for elem in elements:
a = elem.find_element_by_tag_name('a')
print(a.find_element_by_css_selector("span.primary-title").text)
print(a.get_attribute("href"))
print(a.find_element_by_css_selector("span.secondary-title").text)
Upvotes: 1