Beautiful Soup line matching

Question

Im trying to build a html table that only contains the table header and the row that is relevant to me. The site I'm using is http://wolk.vlan77.be/~gerben.

I'm trying to get the the table header and my the table entry so I do not have to look each time for my own name.

What I want to do :

get the html page
Parse it to get the header of the table
Parse it to get the line with table tags relevant to me (so the table row containing lucas)
Build a html page that shows the header and table entry relevant to me

What I am doing now :

get the header with beautifulsoup first
get my entry
add both to an array

pass this array to a method that generates a string that can be printed as html page

def downloadURL(self): global input filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben') input = '' for line in filehandle.readlines(): input += line filehandle.close()

def soupParserToTable(self,input):
    global header

    soup = self.BeautifulSoup(input)
    header = soup.first('tr')
    tableInput='0'

    table = soup.findAll('tr')
    for line in table:
        print line
        print '
 
'
        if '''lucas''' in line:
            print 'true'
        else:
            print 'false'
        print '
 
 **************** 
 
'

I want to get the line from the html file that contains lucas, however when I run it like this I get this in my output :

 **************** 


lucas.vlan77.be V V V 



false

Now I don't get why it doesn't match, the string lucas is clearly in there :/ ?

johnsyweb · Accepted Answer

It looks like you're over-complicating this.

Here's a simpler version...

>>> import BeautifulSoup
>>> import urllib2
>>> html = urllib2.urlopen('http://wolk.vlan77.be/~gerben')
>>> soup = BeautifulSoup.BeautifulSoup(html)
>>> print soup.find('td', text=lambda data: data.string and 'lucas' in data.string)
lucas.vlan77.be

Beautiful Soup line matching

Answers (2)

Related Questions