Beautifulsoup extract string partially

Question

I am new to Beautifulsoup 4 and found it really convenient! However, I got the problem when I need to split the string:

An example here:

I have a link which is

 NIHAO

I get the line with soap.findChildren('a'), but what if I just need the part 'sort=102'?

I tried to use soap.find_all(re.compile(^sort=.*?))but it does not work, can anyone help me with that? Thanks in advance!

alecxe · Accepted Answer

To elaborate a little bit to @Don's answer:

Working sample:

>>> from bs4 import BeautifulSoup
>>> from urlparse import urlparse, parse_qs
>>>
>>> html = 'NIHAO'
>>> parse_qs(urlparse(soup.find("a", text="NIHAO")['href']).query)['sort'][0]
u'102'

Note that in Python 3, you would need to change the urlparse import to:

>>> from urllib.parse import urlparse, parse_qs

Beautifulsoup extract string partially

Answers (2)

Related Questions