Nevo
Nevo

Reputation: 952

Finding all classes in HTML with BeautifulSoup

BeautifulSoup is easy to use if you know the class(es) you want to target. I, however, am scraping a site that changes the classes used in its HTML periodically. (Presumably, to keep people like me from doing what I'm trying to do.)

To counter-attack this defensive maneuver, I want to use BeautifulSoup to look at the structure of the page and suss out which classes I am interested in. In effect, "find a div with class "A" that has twenty div's with class "B" as immediate children," to determine the strings "A" and "B".

I'm reasonably certain BS can be used to accomplish this. I'm also reasonably certain that it would be faster to get help from the community than try to figure this out on my own. I could recursively build a tree of div's, which may be the best solution, but I'm not sure how to account for the fact that some div's belong to multiple CSS classes.

Upvotes: 1

Views: 251

Answers (1)

wasif
wasif

Reputation: 15508

Yes, you can try find_all with class_=True and list comprehension

classes = [class for element in soup.find_all(class_=True) for class in element["class"]]

classes is a list of all the classes.

Upvotes: 2

Related Questions