BeautifulSoup - get attributes on the div's I'm iterating over

Question

I'm using BeautifulSoup to parse lists of companies from VC websites. I've found the right elements to iterate over, but I can't seem to get data on those elements themselves.

Here's the sample HTML I'm going through:

This is how I'm currently using BeautifulSoup and this part is working great:

portfolio = soup.find('div', attrs={'class': 'portfolio-tiles'})
for eachco in portfolio.find_all('article'):
  companyname = eachco.a['title']
  companyurl = eachco.a['href']

But what I want to do is grab the class elements from here:

or

(there are multiple variations for each company in the list)

I've tried iterating through with:

portfolio = soup.find('div', attrs={'class': 'portfolio-tiles'})
for eachco in portfolio.find_all('article'):
  companyattributes = eachco.div['class']

but that spits out rows of:

['company__thumbnail', 'company__thumbnail-link']

(aka, a level below what I'm looking for)

How can I iterate over all of the results but get class elements for each result? I sense I'm missing something really basic, but would appreciate any help figuring out what that thing is!

UPDATE

I ended up going with the following, which got everything working together:

portfolio = soup.find_all('div', class_=re.compile("company company-"))
    for eachco in portfolio:
        coname = eachco.a['title']
        courl = eachco.a['href']
        cotypes = eachco['class']
        costage = cotypes[1]
        comarket = cotypes[2]

KunduK · Accepted Answer

You can use re module to find particular text in class element.

from bs4 import BeautifulSoup
import re
html = """
    
        
        
            
            
                    (
                        
                    

            

            
                    
                        
                    

            
 """

soup = BeautifulSoup(html, 'html.parser')
divs = soup.find_all('div' ,class_=re.compile("stage"))
for div in divs:
    print(div['class'])

Output :

[u'company', u'company-stage--venturegrowth', u'company-type--enterprise', u'company--single-company']
[u'company', u'company-stage--venturegrowth', u'company-type--enterprise', u'company--single-company']

BeautifulSoup - get attributes on the div's I'm iterating over

Answers (2)

Related Questions

BeautifulSoup - get attributes on the div&#39;s I&#39;m iterating over

Answers (2)

Related Questions

BeautifulSoup - get attributes on the div's I'm iterating over