Link extraction from website

Question

I am trying to extract some data from WebMD and once I run my code I keep geeting a "None" as a return. Any idea what I am doing wrong. I have the number of returns the same as the number of links but I do not get the links.

import bs4 as bs
import urllib.request
import pandas as pd


source = urllib.request.urlopen('https://messageboards.webmd.com/').read()

soup = bs.BeautifulSoup(source,'lxml')

for url in soup.find_all('div',class_="link"):
    print (url.get('href'))

brianpck · Accepted Answer

Your url element is actually a div tag, not an a:

>>> x = soup.find_all('div', class_="link")
>>> x[0]
Relationships

You need to select the child before getting the href attribute:

>>> x[0].a.get('href')
'https://messageboards.webmd.com/family-pregnancy/f/relationships/'

Just modify your for loop as follows:

for url in soup.find_all('div',class_="link"):
    print (url.a.get('href'))

Link extraction from website

Answers (2)

Related Questions