Jaka M.
Jaka M.

Reputation: 43

beautifulsoup - how to extract link from a resulting string?

I am doing my first python project, and I got stuck with beautifulsoap... Even after reading thru documentation and trying out a number of things - i am still stuck.

I am parsing amazon result page, and want to scrap link of every item.

So far my code is:

import requests
from bs4 import BeautifulSoup
import time
import re
url = "http://www.amazon.de/s/ref=nb_sb_noss?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&url=search-alias%3Daps&field-keywords=gtx+980+ti+-4gb+-970+-radeon+-amd"
r = requests.get(url)
g_data = soup.find_all("li", {"class": "s-result-item celwidget"})


for item in g_data:
result = item.contents[0].find_all("a", {"class": "a-size-small a-link-normal a-text-normal"})[0]
        print (result)

With my code, I managed to target all items on page (and with code not shown here, I already managed to scrape name of item and price), but with scrapping the actual link i have problems...

So the output of the above code is:

<a class="a-size-small a-link-normal a-text-normal" href="http://www.amazon.de/gp/offer-listing/B01062AE20"><span class="a-size-base a-color-price a-text-bold">EUR 759,00</span><span class="a-letter-space"></span>neu<span class="a-letter-space"></span><span class="a-color-secondary">(32 Angebote)</span><span class="a-letter-space"></span><span class="a-color-secondary a-text-strike"></span></a>

So, how do I get that http://www.amazon.de/gp/offer-listing/B01062AE20 out of there??

I tried with:

item.contents[0].find_all("a", {"class": "a-size-small a-link-normal a-text-normal"})[0].link
item.contents[0].find_all("a", {"class": "a-size-small a-link-normal a-text-normal"})[0].href
item.contents[0].find_all("a", {"class": "a-size-small a-link-normal a-text-normal"})[0].get.link()
...

But no go... I dont want to stupidly just parse the string... Sure BS4 can do this out of the box... just.. how?

thanks in advance, Jaka

Upvotes: 2

Views: 1124

Answers (1)

alecxe
alecxe

Reputation: 473763

Getting the element attribute values in BeautifulSoup is like accessing items in dictionaries:

result["href"]
result.get("href")

Upvotes: 3

Related Questions