Reputation: 11
This is a snippet of the HTML I have:
<body>
<form method="post" action="./pagam.aspx?a=9095709&b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I" id="aspnetForm" autocomplete="off">
<div>
I would like to extract this value:
./pagam.aspx?a=9095709&b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I
from the HTML.
I currently have this as unsure how to do it:
parsed_html = BeautifulSoup(html, 'lxml')
a = parsed_html.body.find('div', attrs={'form method':'post'})
print (a)
Upvotes: 1
Views: 39
Reputation: 483
Here is something you can try :
>>> from bs4 import BeautifulSoup
>>> s = BeautifulSoup('<body>
<form method="post" name="mainForm" action="./pagam.aspx?a=9095709&b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I" id="aspnetForm" autocomplete="off"></body>')
>>> s.find("form", {"name":"mainForm"})
>>> s.find("form", {"name":"mainForm"})['action']
u'./pagam.aspx?a=9095709&b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I'
Upvotes: 1
Reputation: 302
import re
r = re.compile('action="\S+"')
r.match(line)
line[r.start():r.end()].split("=")
Upvotes: 1