Reputation: 532
I was trying to scrape tumblr archive, the div class tag looks like given in picture
The class starts with "post post_micro", I tried using regular expression but failed
soup.find_all(class_=re.compile('^post post_micro')
I tried to use function in find_all for class
def func(x):
if str(x).startswith('post_tumblelog'):
return True
and used it as:
soup.find_all(class_=func)
The above works fine and I am getting what I need. But I want to know how to do it using regular expressions and why in the func(x),
str(x).startswith('post_tumblelog')
evaluates as True when the class name is starting with "post post_micro".
Upvotes: 1
Views: 4229
Reputation: 240948
In BeautifulSoup 4, you can use the .select()
method since it can accept a CSS attribute selector. In your case, you would use the attribute selector [class^="post_tumblelog"]
, which will select class
attributes starting with the string post_tumblelog
.
soup.select('[class^="post_tumblelog"]')
Alternatively, you could also use:
soup.find_all(class_=lambda x: x and x.startswith('post_tumblelog'))
As a side note, it looks like you were missing a parenthesis, the following works:
soup.find_all(class_=re.compile('^post_tumblelog'))
Upvotes: 4