Luck Box
Luck Box

Reputation: 90

How to scrape attributes containing string characters (python, beautifulsoup)

I am trying to scrape all href tags that contains ^/album$. When I print out my result, I get an empty list. I have tried find() and findAll() with re.compile and re.search. I am unable to get anything other than an empty list.

Code:

vk_urls = soup.find_all('a')
vk_albums = soup.findAll(text='^/album$')
print(vk_albums)

Result:

[]

Desired Result:

/album...
/album...
/album...

Upvotes: 0

Views: 67

Answers (1)

saidalkharusi
saidalkharusi

Reputation: 76

You need to use href= instead of text= (or string= in Beautiful Soup 4) to filter by content of href attribute. The latter (i.e. text and string) are used to search for strings within tags.

To find all anchor tags with an href attribute that contains /album, you need to do the following:

vk_albums = soup.find_all("a", href=re.compile("^/album"))
print(vk_albums) 

You can then loop through this list to print just the href attributes:

for album in vk_albums:
    print(album['href'])

Upvotes: 1

Related Questions