Reputation: 123
I am new to Beautifulsoup 4 and found it really convenient! However, I got the problem when I need to split the string:
An example here:
I have a link which is
<a href="http://nihao-wobuhao?%93%23%24%12&sort=102">NIHAO</a>
I get the line with soap.findChildren('a')
, but what if I just need the part 'sort=102'?
I tried to use soap.find_all(re.compile(^sort=.*?))
but it does not work, can anyone help me with that? Thanks in advance!
Upvotes: 1
Views: 118
Reputation: 473863
To elaborate a little bit to @Don's answer:
a
element by, for example, texthref
attribute value using a dictionary-like access to Tag
's attributesurlparse.parse_qs()
to get the url query parametersWorking sample:
>>> from bs4 import BeautifulSoup
>>> from urlparse import urlparse, parse_qs
>>>
>>> html = '<a href="http://nihao-wobuhao?%93%23%24%12&sort=102">NIHAO</a>'
>>> parse_qs(urlparse(soup.find("a", text="NIHAO")['href']).query)['sort'][0]
u'102'
Note that in Python 3, you would need to change the urlparse import to:
>>> from urllib.parse import urlparse, parse_qs
Upvotes: 0
Reputation: 56640
The urlparse
module will pick out the pieces of a URL. You could use that to get the query parameter you're looking for.
Upvotes: 1