Reputation: 2397
I am observing http://www.bing.com/videos/search?q=kohli and trying to lookup video urls.
Anchor tag contains youtube link, but inside dictionary which I want to extract.
redditFile = urllib2.urlopen("http://www.bing.com/videos?q="+urllib.quote_plus(word))
redditHtml = redditFile.read()
redditFile.close()
soup = BeautifulSoup(redditHtml)
productDivs = soup.findAll('div', attrs={'class' : 'dg_u'})
for div in productDivs:
print div.find('a')['vrhm'] #This element contains youtube urls but print does not display it
if div.find('div', {"class":"vthumb", 'smturl': True}) is not None:
print div.find('div', {"class":"vthumb", 'smturl': True})['smturl'] #this gives link to micro video
How can I get youtube link from a
tag and vrhm
attribute?
Upvotes: 1
Views: 145
Reputation: 26667
You can use the json.load
to load a a dictionary from json string.
The for
loop can be modified as
>>> productDivs = soup.findAll('div', attrs={'class' : 'dg_u'})
>>> for div in productDivs:
... a_dict = json.loads( div.a['vrhm'] )
... print a_dict['p']
https://www.youtube.com/watch?v=bWbrWI3PBss
https://www.youtube.com/watch?v=bWbrWI3PBss
https://www.youtube.com/watch?v=PbTx2Fjth-0
https://www.youtube.com/watch?v=pB1Kjx-eheY
..
..
What it does?
div.a['vrhm']
extracts the vrhm
attribute of the immediate a
child of the div
.
a_dict = json.loads( div.a['vrhm'] )
loads the json string and creates the dictionary a_dict
.
print a_dict['p']
The a_dict
is a python dictionary. Use them as you usually do.
Upvotes: 1