Abhishek Sharma
Abhishek Sharma

Reputation: 193

Python BeautifulSoup

I am using Python BeautifulSoup to extract some data from a famous song site.

Here is the snippet of code:

import requests
from bs4 import BeautifulSoup


url= 'https://gaana.com/playlist/gaana-dj-bollywood-top-50-1'
res = requests.get(url)
while(res.status_code!=200):
    try:
        res = requests.get('url')
    except:
        pass
print (res)
soup = BeautifulSoup(res.text,'lxml')
songs = soup.find_all('meta',{'property':'music:song'})
print (songs[0])

Here is the sample output:

<Response [200]>
<meta content="https://gaana.com/song/o-saathi" property="music:song"/>

Now i want to extract the url within content as string so that i can further use that url in my program.

Someone please Help me.

Upvotes: 0

Views: 107

Answers (1)

Ellie Lockhart
Ellie Lockhart

Reputation: 172

It's in the comments, but I just want to explain: beautifulsoup returns most results as a list or other iterable object. You show that you understand this in your code by using songs[0], but in this case what's been returned is a dictionary.

As explained in this StackOverflow post, you have need to query not only songs[0] but also the property within the dictionary (the two together are called a key pair and are the chief way to get data out of a dictionary).

Last note: while I've been a big fan of BeautifulSoup4 for basic web scraping, you may consider the lxml library. It's pretty well documented; to really take advantage of it you have to learn Python-variety Xpaths, which are sort of like regex for XML/HTML; but for advanced scraping it's probably the last best option short of Selenium, and it returns cleaner data than bs4.

Good luck!

Upvotes: 1

Related Questions