James
James

Reputation: 209

BeautifulSoup: Scrape list of embedded href links

I'm working on scraping information about some of the most recent trending videos here https://www.youtube.com/feed/trending. I loaded the page into BeautifulSoup but get an error when trying to run through the list of div's I need to parse.

import urllib2
from bs4 import BeautifulSoup

url = 'https://www.youtube.com/feed/trending'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page,'html.parser')

#narrow in to divs with relevant meta-data
videos = soup.find_all('div',class_='yt-lockup-content')
videos[50].div.a['href'] #checking one specific DIV
>>u'user/nameofchannel' #works

Up to this point I have returned information I need, but when I try to run through all divs (70+ on this page as of writing), I get a error related to the data type this method returns.

for v in videos:
     videos[v].div.a['href']
>> TypeError: list indices must be integers, not Tag

How can I run through the list of div's returned in 'videos' and print out a list of values that match 'video[n].div.a['href'] ?

Upvotes: 1

Views: 659

Answers (1)

宏杰李
宏杰李

Reputation: 12168

for v in range(len(videos)):
     videos[v].div.a['href']

what you need is the index of videos list , not the tag in it.

Better:

for index, value in enumerate(videos):
    videos[index].div.a['href']

Much Better:

[v.div.a['href'] for v in videos]

use list comprehension is recommended for this kind of task

Upvotes: 2

Related Questions