Reputation: 43
I would like to get the content from all <p>
tags on the web-page so I wrote this code:
from selenium import webdriver
driver = webdriver.Firefox()
href_list = []
href_p_dict = {}
for i in range(1, 11):
get_link = f"https://rifey.ru/news?page={i}"
driver.get(get_link)
e_list = driver.find_elements_by_class_name('block-link')
for e in e_list:
href_list.append(e.get_attribute('href'))
for href in href_list:
driver.get(href)
content = driver.find_elements_by_tag_name('p')
href_p_dict.update({href: content})
print(href, content)
But my output is like this:
https://rifey.ru/news/list/id_102034 [<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="480831d9-04ed-4443-99de-3a7f16ff3c9c")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="c1e11246-d799-4cfd-bdfb-f1e85f3eaeab")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="4cf97837-fa83-466c-a07d-11b9121b314e")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="89807888-63b3-478d-9af8-92d3155d2197")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="0a5b148a-07cb-46eb-bd63-93fa0a2c7339")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="e36ad0f0-7b5d-4781-9a34-b5c4c198fa97")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="080de5d0-dbab-4059-afcd-120b039dc4b8")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="3a1c5678-be15-4205-97e4-7cfcb8267717")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="400ee932-f79f-40a0-acc0-e0e93832cbb5")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="c32ed757-2646-48c0-9542-ae6c26da20ca")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="1602487b-8bab-42ad-84fd-cf9c4a60506e")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="1a745694-5e46-4676-9aed-7623e758697a")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="75246d26-78b9-4f3a-bb8e-0d61b466ed95", element="5f80d5a6-85af-48b4-9dab-b7d8442c826c")>]
I expect to get the text inside the <p>
. I have tried to change my code like this:
content = driver.find_elements_by_tag_name('p').text
and this:
href_p_dict.update({href: content.text})
But I have the same traceback:
Traceback (most recent call last):
File "/home/alyferryhalo/Documents/code/work/rifey_parser.py", line 17, in <module>
content = driver.find_elements_by_tag_name('p').text
AttributeError: 'list' object has no attribute 'text'
How can I fix it?
I use these:
Upvotes: 1
Views: 106
Reputation: 29362
You can not do
content = driver.find_elements_by_tag_name('p').text
since the moment you use find_elements
, it will return a list in Python.
A list does not have a text
method. so the error that you have been facing is accurate.
AttributeError: 'list' object has no attribute 'text'
Now to resolve this :
do this :
for con in driver.find_elements_by_tag_name('p')
print(con.text)
Upvotes: 1