Reputation: 4516
I have some <div>
s and other stuff in a site and the specific line in the middle of inumerous divs
<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>
How can I get the "value" part from this code, which it is in the middle of a site with other stuff ?
I'm trying with urllib but I don't even know where to start =/
Upvotes: 0
Views: 3005
Reputation: 50547
import lxml.html as lh
html = '''
<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>
'''
# If you want to parse from a URL:
# tree = lh.parse('http://example.com')
tree = lh.fromstring(html)
print tree.xpath("//input[@name='extWarrantyProds']/@value")
Upvotes: 3
Reputation: 12974
The easiest way I can think of:
import urllib
urlStr = "http://www..."
fileObj = urllib.urlopen(urlStr)
for line in fileObj:
if ('<input name="extWarrantyProds"' in line):
startIndex = line.find('value="') + 7
endIndex = line.find('"',startIndex)
print line[startIndex:endIndex]
Upvotes: 3
Reputation: 34734
No need for anything too fancy if that's all you need. Download the page using urllib
and look for the value using re.findall()
.
import re
import urllib
url = 'http://...'
html = urllib.urlopen(url).read()
matches = re.findall('<input name="extWarrantyProds.*?>', x, re.DOTALL)
for i in matches:
print re.findall('value="(.*?)"', i)
Upvotes: 1