How to scrape attributes containing string characters (python, beautifulsoup)

Question

I am trying to scrape all href tags that contains ^/album$. When I print out my result, I get an empty list. I have tried find() and findAll() with re.compile and re.search. I am unable to get anything other than an empty list.

Code:

vk_urls = soup.find_all('a')
vk_albums = soup.findAll(text='^/album$')
print(vk_albums)

Result:

[]

Desired Result:

/album...
/album...
/album...

saidalkharusi · Accepted Answer

You need to use href= instead of text= (or string= in Beautiful Soup 4) to filter by content of href attribute. The latter (i.e. text and string) are used to search for strings within tags.

To find all anchor tags with an href attribute that contains /album, you need to do the following:

vk_albums = soup.find_all("a", href=re.compile("^/album"))
print(vk_albums)

You can then loop through this list to print just the href attributes:

for album in vk_albums:
    print(album['href'])

How to scrape attributes containing string characters (python, beautifulsoup)

Answers (1)

Related Questions