Reputation: 11

Parsing HTML value in Python

This is a snippet of the HTML I have:

<body>
    <form method="post" action="./pagam.aspx?a=9095709&amp;b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I" id="aspnetForm" autocomplete="off">
<div>

I would like to extract this value:

./pagam.aspx?a=9095709&amp;b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I

from the HTML.

I currently have this as unsure how to do it:

parsed_html = BeautifulSoup(html, 'lxml')
a = parsed_html.body.find('div', attrs={'form method':'post'})
print (a)

Upvotes: 1

Answers (2)

P.B.UDAY

Reputation: 483

Here is something you can try :

>>> from bs4 import BeautifulSoup
>>> s = BeautifulSoup('<body>
<form method="post" name="mainForm" action="./pagam.aspx?a=9095709&amp;b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I" id="aspnetForm" autocomplete="off"></body>')
>>> s.find("form", {"name":"mainForm"})

>>> s.find("form", {"name":"mainForm"})['action']

u'./pagam.aspx?a=9095709&amp;b=RkVsgP1UClEdbu0oUvc8pKDxd5OcslXk1xHlVhK7uuqH_7ZfaquNNa1VHgeSZWm9hAq4s7Thk6wIhoRsooDoMF7U2nzmVDDbRujlxaPTg8I'

Upvotes: 1

Evgeniy

Reputation: 302

import re
r = re.compile('action="\S+"')
r.match(line)
line[r.start():r.end()].split("=")

Upvotes: 1

Parsing HTML value in Python

Answers (2)

Related Questions