Reputation: 97
I am trying to extract some data from WebMD and once I run my code I keep geeting a "None" as a return. Any idea what I am doing wrong. I have the number of returns the same as the number of links but I do not get the links.
import bs4 as bs
import urllib.request
import pandas as pd
source = urllib.request.urlopen('https://messageboards.webmd.com/').read()
soup = bs.BeautifulSoup(source,'lxml')
for url in soup.find_all('div',class_="link"):
print (url.get('href'))
Upvotes: 1
Views: 71
Reputation: 2277
soup.find_all('div',class_="link")
returns all div
elements with the class link
. These elements wrap the a
elements that contain the href attributes, so you need to get the href from the correct element like so:
for div in soup.find_all('div',class_="link"):
print (div.a.get('href'))
Upvotes: 0
Reputation: 8254
Your url
element is actually a div
tag, not an a
:
>>> x = soup.find_all('div', class_="link")
>>> x[0]
<div class="link"><a href="https://messageboards.webmd.com/family-pregnancy/f/relationships/">Relationships</a></div>
You need to select the child before getting the href attribute:
>>> x[0].a.get('href')
'https://messageboards.webmd.com/family-pregnancy/f/relationships/'
Just modify your for loop as follows:
for url in soup.find_all('div',class_="link"):
print (url.a.get('href'))
Upvotes: 1