Reputation: 952
BeautifulSoup is easy to use if you know the class(es) you want to target. I, however, am scraping a site that changes the classes used in its HTML periodically. (Presumably, to keep people like me from doing what I'm trying to do.)
To counter-attack this defensive maneuver, I want to use BeautifulSoup to look at the structure of the page and suss out which classes I am interested in. In effect, "find a div
with class "A" that has twenty div
's with class "B" as immediate children," to determine the strings "A" and "B".
I'm reasonably certain BS can be used to accomplish this. I'm also reasonably certain that it would be faster to get help from the community than try to figure this out on my own. I could recursively build a tree of div
's, which may be the best solution, but I'm not sure how to account for the fact that some div
's belong to multiple CSS classes.
Upvotes: 1
Views: 251
Reputation: 15508
Yes, you can try find_all
with class_=True
and list comprehension
classes = [class for element in soup.find_all(class_=True) for class in element["class"]]
classes
is a list of all the classes.
Upvotes: 2