Reputation: 2978
I want to scrpe "Table:" & "Release date: " from the URL: https://www150.statcan.gc.ca/n1/en/type/data?geoname=A0002&p=0#
I am using salenium web driver to scrape
Below is the tags present in source.
<ul>
# Some HTML Data
</ul>
<ul data-offset="0">
<li class="ndm-item">
# Some HTML Tags
</ul>
<ul>
# Some HTML Tags
</ul>
I want to get the details of SECOND tag "ul" where "data-offset" present
for Class_L1 in Soup.findAll('ul', {'data-offset': "0"}):
for Class_L2 in Class_L1('li', {'class': 'ndm-item'}):
for Class_L3 in Class_L2('div', {'class': 'ndm-result-container'}):
for Class_L4 in Class_L3.findAll('div', {'class': 'ndm-result-productid'}):
Table = str(Class_L4.get_text()).strip()
print(Table)
for Class_L4 in Class_L3.findAll('div', {'class': 'ndm-result-date'}):
Release_Date = str(Class_L4.get_text()).strip()
print(Release_Date)
Problem is source contains multiple 'ul' tags with data-offset="0", I just want to get details from SECOND 'ul' tag which contains data-offset="0"
Upvotes: 0
Views: 606
Reputation: 84465
You can use an nth-of-type selector. This is based on:
I want to scrape "Table:" & "Release date: " from the URL
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www150.statcan.gc.ca/n1/en/type/data?geoname=A0002&p=0'
driver = webdriver.Chrome()
driver.get(url)
tableInfo = [table.text for table in WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#all .ndm-result-productid")))]
dates = [date.text for date in WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#all .ndm-result-date:nth-child(2)")))]
results = list(zip(tableInfo, dates))
print(results)
driver.quit()
Upvotes: 1