Reputation: 802
My code goes into a website and scrapes rows of information (title and time).
However, there is one tag ('p') that I am not sure how to get using 'get element by'.
On the website, it is the information under each title.
Here is my code so far:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
driver = webdriver.Chrome()
driver.get('https://www.nutritioncare.org/ASPEN21Schedule/#tab03_19')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
eachRow = driver.find_elements_by_class_name('timeline__item')
time.sleep(1)
for item in eachRow:
time.sleep(1)
title = item.find_element_by_class_name('timeline__item-title')
tim = item.find_element_by_class_name('timeline__item-time')
tex = item.find_element_by_tag_name('p') # This is the part I don’t know how to scrape
print(title.text, tim.text, tex.text)
Upvotes: 0
Views: 1278
Reputation: 21
Maybe try using different find_elements_by_class... I don't use Python that much, but try this unless you already have.
Upvotes: 0
Reputation: 191
I checked the page and there are several p tags, I suggest to use find_elements_by_tag_name instead of find_element_by_tag_name
(to get all the p tags including the p tag that you want) and iterate over all the p tags elements and then join the text content and do strip on it.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import requests
driver = webdriver.Chrome()
driver.get('https://www.nutritioncare.org/ASPEN21Schedule/#tab03_19')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
eachRow = driver.find_elements_by_class_name('timeline__item')
time.sleep(1)
for item in eachRow:
time.sleep(1)
title=item.find_element_by_class_name('timeline__item-title')
tim=item.find_element_by_class_name('timeline__item-time')
tex=item.find_elements_by_tag_name('p')
text = " ".join([i.text for i in tex]).strip()
print(title.text,tim.text, text)
Upvotes: 1
Reputation: 1888
Since the webpage has several p
tags, it would be better to use the .find_elements_by_class()
method. Replace the print
call in the code with the following:
print(title.text,tim.text)
for t in tex:
if t.text == '':
continue
print(t.text)
Upvotes: 1