Gregory Beckwith
Gregory Beckwith

Reputation: 31

Python, Beautiful Soup + how to parse dynamic class?

I am new to Beautiful Soup and Python in general, but my question is how would I go about specifying a class that is dynamic (productId)? Can I use a mask or search part of the class, i.e. "product summary*"

<li class="product_summary clearfix {productId: 247559}">

</li>

I want to get the product_info and also the product_image (src) data below the product_summary class list, but I don't know how to find_all when my class is dynamic. Hope this makes sense. My goal is to insert this data into a MySQL table, so my thought is I need to store all data into variables at the highest (product summary) level. Thanks in advance for any help.

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url = Request('http://www.shopwell.com/sodas/c/22', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(url).read()

soup = BeautifulSoup(webpage)

product_info = soup.find_all("div", {"class": "product_info"})

for item in product_info:

        detail_link = item.find("a", {"class": "detail_link"}).text

        try:
            detail_link_h2 = ""
            detail_link_h2 = item.h2.text.replace("\n", "")
        except:
            pass

        try:
            detail_link_h3 = ""
            detail_link_h3 = item.h3.text.replace("\n", "")
        except:
            pass
        try:
            detail_link_h4 = item.h4.text.replace("\n", "")
        except:
            pass

        print(detail_link_h2 + ", " + detail_link_h3 + ", " + detail_link_h4)


product_image = soup.find_all("div", {"class": "product_image"})

for item in product_image:

        img1 = item.find("img")
        print(img1)

Upvotes: 3

Views: 2277

Answers (2)

MinestoPix
MinestoPix

Reputation: 173

I think you can use regular expressions like this:

import re
product_image = soup.find_all("div", {"class": re.compile("^product_image")})

Upvotes: 4

Brendan Long
Brendan Long

Reputation: 54272

Use:

soup.find_all("li", class_="product_summary")

Or just:

soup.find_all(class_="product_summary")

See the documentation for searching by CSS class.

It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, “class”, is a reserved word in Python. Using class as a keyword argument will give you a syntax error. As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_

Upvotes: 2

Related Questions