RRIIZZ
RRIIZZ

Reputation: 21

How to get href from tag <a> (html) by BS4?

Hie there! I can't get href from tag in BS4. That is my code:

import requests
from bs4 import BeautifulSoup

URL = 'https://auto.ria.com/newauto/marka-jeep/'
HEADERS = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)                        AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', 'accept':'*/*'}

def get_html(url, params=None):
    r = requests.get(url, headers=HEADERS, params=params)
    return r 

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all(class_='proposition_area')
    cars=[]
    for item in items:
        cars.append({
            'title': item.find('h3', class_='proposition_name').get_text(strip=True),
            'link': item.find('a', class_='proposition_link').getAttribute("href")              
        })
    print(cars)

def parse():
    html = get_html(URL)
    if html.status_code == 200:
        get_content(html.text)
    else: 
        print('error')  
parse()

output:

'link': item.find('a', class_='proposition_link').getAttribute("href")

Error:

AttributeError: 'NoneType' object has no attribute 'getAttribute'

Upvotes: 0

Views: 460

Answers (1)

MendelG
MendelG

Reputation: 20038

You have two problems:

  1. You are doing items = soup.find_all(class_='proposition_area') and later looping over that class to search for the class proposition_area with:

    for item in items:
         cars.append({
             'title': item.find('h3', class_='proposition_name').get_text(strip=True),
             'link': item.find('a', class_='proposition_link').get("href")
         })
    

    the item.find('a', class_='proposition_link') is not within the class proposition_area which you are looping for, so, instead do:

    items = soup.find_all(class_='proposition')
    

  1. Don't use the .getAttribute() method, instead, use .get().

Here's a fully working example:

import requests
from bs4 import BeautifulSoup

URL = 'https://auto.ria.com/newauto/marka-jeep/'
HEADERS = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', 'accept':'*/*'}

def get_html(url, params=None):
    r = requests.get(url, headers=HEADERS, params=params)
    return r

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all(class_='proposition')
    cars=[]
    for item in items:
        cars.append({
            'title': item.find('h3', class_='proposition_name').get_text(strip=True),
            'link': item.find('a', class_='proposition_link').get("href")
        })
    print(cars)

def parse():
    html = get_html(URL)
    if html.status_code == 200:
        get_content(html.text)
    else:
        print('error')
parse()

Output:

[{'title': 'Jeep Gladiator 2021', 'link': '/newauto/auto-jeep-gladiator-1862595.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1859603.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1863650.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1842428.html'}, {'title': 'Jeep Renegade 2021', 'link': '/newauto/auto-jeep-renegade-1838198.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1853604.html'}, {'title': 'Jeep Wrangler 2021', 'link': '/newauto/auto-jeep-wrangler-1838190.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1811781.html'}, {'title': 'Jeep Wrangler 2021', 'link': '/newauto/auto-jeep-wrangler-1857232.html'}, {'title': 'Jeep Wrangler 2021', 'link': '/newauto/auto-jeep-wrangler-1860925.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1836192.html'}, {'title': 'Jeep Renegade 2021', 'link': '/newauto/auto-jeep-renegade-1857781.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1838297.html'}, {'title': 'Jeep Wrangler 2021', 'link': '/newauto/auto-jeep-wrangler-1860927.html'}, {'title': 'Jeep Wrangler 2021', 'link': '/newauto/auto-jeep-wrangler-1860588.html'}, {'title': 'Jeep Gladiator 2021', 'link': '/newauto/auto-jeep-gladiator-1856629.html'}, {'title': 'Jeep Renegade 2021', 'link': '/newauto/auto-jeep-renegade-1857246.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1857805.html'}, {'title': 'Jeep Grand Cherokee 2021', 'link': '/newauto/auto-jeep-grand-cherokee-1829808.html'}, {'title': 'Jeep Wrangler 2021', 'link': '/newauto/auto-jeep-wrangler-1862123.html'}]

Upvotes: 1

Related Questions