Using beautifulsoup in python to get link names and "selecting" links instead of limiting?

Question

I've got the following code trying to return data from some html, however I am unable to return what I require...

import urllib2
from bs4 import BeautifulSoup
from time import sleep

def getData():
    htmlfile = open('C:/html.html', 'rb')
    html = htmlfile.read()
    soup = BeautifulSoup(html)
    items = soup.find_all('div', class_="blocks")
    for item in items:
        links = item.find_all('h3')
        for link in links:
            print link

getData()

Returns the a list of following:


    
    TITLE STUFF HERE (YES)
    



    
    TITLE STUFF HERE (MAYBE)

I want to be able to return just the title: TITLE STUFF HERE (YES) and TITLE STUFF HERE (MAYBE)

Another thing I want to be able to do to use the soup.find_all("a", limit=2) function but instead of "limit" and instead of returning two results only I want it to return ONLY the second link... so a select feature not a limit? (Does such a feature exist?)

prgao · Accepted Answer

import urllib2
from bs4 import BeautifulSoup
from time import sleep

def getData():
    htmlfile = open('C:/html.html', 'rb')
    html = htmlfile.read()
    soup = BeautifulSoup(html)
    items = soup.find_all('div', class_="blocks")
    for item in items:
        links = item.find_all('a')
        for link in links:
            if link.parent.name == 'h3':
                print(link.text)

getData()

You can also just find all the links from the very beginning and check both the parent is h3 and the parent's parent is a div with class blocks

Using beautifulsoup in python to get link names and "selecting" links instead of limiting?

Answers (1)

Related Questions

Using beautifulsoup in python to get link names and &quot;selecting&quot; links instead of limiting?

Answers (1)

Related Questions

Using beautifulsoup in python to get link names and "selecting" links instead of limiting?