Amen Aziz
Amen Aziz

Reputation: 779

Give me Attribute error using beautifulsoup

They show me error that You're probably treating a list of elements like a single element these is page link https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep

headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
base_url='https://www.avocats-lille.com/'
url = 'https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries'
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
tra = soup.find_all('h2',class_='title')
productlinks=[]
for links in tra:
    for link in links.find_all('a',href=True):
        comp=base_url+link['href']
        productlinks.append(comp)
        
for link in productlinks:
    driver.get(link)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    tel=soup.select('.address+ .contact p').text
    email=soup.select('.contact a').text
    print(tel,email)

Upvotes: 0

Views: 119

Answers (2)

HedgeHog
HedgeHog

Reputation: 25241

Instead of select() that will give you a ResultSet use select_one() to get the first / a single element:

soup.select_one('.address+ .contact p').text

Else iterate the ResultSet or pick element by index.

soup.select('.address+ .contact p')[0].text

EDIT

Based on your comment, there are different approaches to get the goal. Note: All these uses walrus operator that needs python >= 3.8 to shorten the lines, else use normal if statements.

Regex: Best in my opinion, if you do not know if tel or fax ist first element.

import re

t.group(1) if (t:=re.search('Tél. ([\d\s]*)', soup.select_one('.address+ .contact p').text)) else None

Slicing: Only if tel is first element.

t.contents[0][5:] if (t:=soup.select_one('p:-soup-contains("Tél")')) else None

Replacing: Only if tel is first element.

t.contents[0].replace('Tél.','').strip() if (t:=soup.select_one('p:-soup-contains("Tél")')) else None 

Upvotes: 1

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16187

The CSS selectors strategy

soup.select_one('.contact + div > p:nth-child(1) > span > a')

will extract the emails

and

soup.select_one('.address+ .contact p') 

will grab the telephone numbers only.

Full working code as an example:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options


options = Options()
options.add_argument("--no-sandbox")
options.add_argument("start-maximized")
#options.add_experimental_option("detach", True)
s = Service("./chromedriver") #Your chromedriver path

driver = webdriver.Chrome(service= s, options=options)
base_url='https://www.avocats-lille.com/'
url = 'https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries'

driver.get(url)
time.sleep(1)

soup = BeautifulSoup(driver.page_source, "html.parser")
tra = soup.find_all('h2',class_='title')
productlinks=[]
for links in tra:
    for link in links.find_all('a',href=True):
        comp=base_url+link['href']
        productlinks.append(comp)
        
for link in productlinks:
    driver.get(link)
    time.sleep(1)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    t=soup.select_one('p:-soup-contains("Tél.")')
    tel = t.next_element.replace('Tél.', '').strip() if t else None
    mail=soup.select_one('.contact + div > p:nth-child(1) > span > a')
    email = mail.text if mail else None
    print(tel,email)

Output:

03 28 07 30 11 [email protected]
03 28 36 94 42 [email protected]
03 59 09 68 95 [email protected]
03 20 74 98 81 [email protected]
03 20 74 22 33 [email protected]
03 20 54 81 55 [email protected]
06 31 20 89 94 [email protected]
03 20 02 98 60 [email protected]
06 33 34 28 04 [email protected]
03 20 21 45 45 [email protected]
03 20 74 16 73 [email protected]
03 20 13 01 07 [email protected]
06 79 42 61 53 [email protected]
03 20 14 93 43 [email protected]
03 28 52 95 00 [email protected]
07 56 95 80 48 [email protected]
03 28 66 81 74 [email protected]

... so on

Upvotes: 0

Related Questions