Reputation: 1

python how to fetch these string

text=u’<a href="#5" accesskey="5"></a><a href="#1" accesskey="1"><font color="#667755">\ue689</font></a><a href="#2" accesskey="2"><font color="#667755">\ue6ec</font></a><a href="#3" accesskey="3"><font color="#667755">\ue6f6</font></a>‘

I am a python new hand. I wanna get \ue6ec、\ue6f6、\ue6ec,how to fetch these string use re module. Thank you very much!

Upvotes: 0

Answers (4)

Utku Zihnioglu

Reputation: 4883

If you know that the page will always have that format, use BeautifulSoup parser to find what you need in HTML.

However, sometimes BeautifulSoup may break due to malformed HTML. I'd suggest you to use lxml which is python binding of libxml2. It will parse and usually correct the malformed HTML.

Upvotes: 0

Kimvais

Reputation: 39628

>>> from BeautifulSoup import BeautifulSoup
>>> text=u'<a href="#5" accesskey="5"></a><a href="#1" accesskey="1"><font color="#667755">\ue689</font></a><a href="#2" accesskey="2"><font color="#667755">\ue6ec</font></a><a href="#3" accesskey="3"><font color="#667755">\ue6f6</font></a>'
>>> t = BeautifulSoup(text)
>>> t.findAll(text=True)
[u'\ue689', u'\ue6ec', u'\ue6f6']

Upvotes: 2

user225312

Reputation: 131817

Don't use regular expressions to parse HTML. Use BeautifulSoup. Documentation for BeautifulSoup.

Upvotes: 1

ceth

Reputation: 45325

Regexp is not good tool to work with HTML. Use the Beautiful Soup.

Upvotes: 2

python how to fetch these string

Answers (4)

Related Questions