user521023
user521023

Reputation: 1

python how to fetch these string

text=u’<a href="#5" accesskey="5"></a><a href="#1" accesskey="1"><font color="#667755">\ue689</font></a><a href="#2" accesskey="2"><font color="#667755">\ue6ec</font></a><a href="#3" accesskey="3"><font color="#667755">\ue6f6</font></a>‘ 

I am a python new hand. I wanna get \ue6ec、\ue6f6、\ue6ec,how to fetch these string use re module. Thank you very much!

Upvotes: 0

Views: 134

Answers (4)

Utku Zihnioglu
Utku Zihnioglu

Reputation: 4883

If you know that the page will always have that format, use BeautifulSoup parser to find what you need in HTML.

However, sometimes BeautifulSoup may break due to malformed HTML. I'd suggest you to use lxml which is python binding of libxml2. It will parse and usually correct the malformed HTML.

Upvotes: 0

Kimvais
Kimvais

Reputation: 39628

>>> from BeautifulSoup import BeautifulSoup
>>> text=u'<a href="#5" accesskey="5"></a><a href="#1" accesskey="1"><font color="#667755">\ue689</font></a><a href="#2" accesskey="2"><font color="#667755">\ue6ec</font></a><a href="#3" accesskey="3"><font color="#667755">\ue6f6</font></a>'
>>> t = BeautifulSoup(text)
>>> t.findAll(text=True)
[u'\ue689', u'\ue6ec', u'\ue6f6']

Upvotes: 2

user225312
user225312

Reputation: 131817

Don't use regular expressions to parse HTML. Use BeautifulSoup. Documentation for BeautifulSoup.

Upvotes: 1

ceth
ceth

Reputation: 45325

Regexp is not good tool to work with HTML. Use the Beautiful Soup.

Upvotes: 2

Related Questions