Reputation: 13
Here is the HTML code that I am trying to extract the text from:
<fieldset>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">CNPJ:</label>011234560083
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">CIDADE:</label>TAUBATE
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">ESTADO:</label>SP
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">TOTAL BRUTO: </label>2.407,09
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">LIQ: </label>2.344,09
</div>
</fieldset>
This code,
print browse.find_element_by_xpath("//div[@class='grid-3-12 form-no-lbl']").text
returns just the first element: 011234560083
How can I read values for each label? Like "LIQ:" = 2.344,09
Upvotes: 1
Views: 3427
Reputation: 879073
If you have the luxury of having both Selenium and lxml available, you could use Selenium for navigating to the desired page(s), and then using lxml to parse the HTML. For example,
import lxml.html as LH
# content = browser.page_source
content = '''\
<fieldset>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">CNPJ:</label>011234560083
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">CIDADE:</label>TAUBATE
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">ESTADO:</label>SP
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">TOTAL BRUTO: </label>2.407,09
</div>
<div class="grid-3-12 form-no-lbl">
<label class="form-lbl">LIQ: </label>2.344,09
</div>
</fieldset>'''
root = LH.fromstring(content)
labels = root.xpath('//fieldset/div[@class="grid-3-12 form-no-lbl"]/label')
data = [[item.strip() for item in [elt.text, elt.tail]] for elt in labels]
yields
[['CNPJ:', '011234560083'],
['CIDADE:', 'TAUBATE'],
['ESTADO:', 'SP'],
['TOTAL BRUTO:', '2.407,09'],
['LIQ:', '2.344,09']]
Upvotes: 1
Reputation: 25531
It seems really odd that your code doesn't work. I haven't run into a case quite like this. I think the code below should work. Basically I grab the text inside the LABEL
and prepend it to the text you are already finding. The combination should get you the text you are looking for.
lines = browse.find_elements_by_css_selector("div.grid-3-12.form-no-lbl")
for line in lines
print line.find_element_by_css_selector("label.form-lbl").text + line.text
Upvotes: 1
Reputation: 473753
It is a rather common problem in Selenium. Just because you cannot directly match the text nodes with find_element_by_*
commands.
In your case, I assume you know the LIQ
, ESTADO
etc labels beforehand and need to get a value by the label.
The idea would be to locate a label
by text, move up the tree to the parent, get the text, split by :
and get the last element which would correspond to the desired value:
label = "ESTADO"
text = driver.find_element_by_xpath("//label[starts-with(., '%s:')]/.." % label).text
print(text.split(":")[-1].strip())
Upvotes: 0