heron
heron

Reputation: 3661

Getting html tag value in python

I'm newbie to python. Here is my code working on python 2.7.5

import urllib2
import sys       

url ="mydomain.com"
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()

print data

Getting HTML markup like that and it works.

What I want to do is, to get value from inside <font class="big"></font> tag. for ex. I need data value from this example:

<font class="big">Data</font>

How to do it?

Upvotes: 5

Views: 20332

Answers (2)

falsetru
falsetru

Reputation: 369064

Using lxml:

import urllib2
import lxml.html

url ="mydomain.com"

usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
for font in lxml.html.fromstring(data).cssselect('font.big'):
    print font.text

>>> import lxml.html
>>> root = lxml.html.fromstring('<font class="big">Data</font>')
>>> [font.text for font in root.cssselect('font.big')]
['Data']

Upvotes: 1

TerryA
TerryA

Reputation: 59974

You can use a HTML parser module such as BeautifulSoup:

from bs4 import BeautifulSoup as BS
url ="mydomain.com"
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
soup = BS(data)
print soup.find('font', {'class':'big'}).text

This finds a tag <font> with a class="big". It then prints its content.

Upvotes: 9

Related Questions