Lara
Lara

Reputation: 13

How can I extract text after an element using Python and Selenium?

Here is the HTML code that I am trying to extract the text from:

<fieldset>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CNPJ:</label>011234560083
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CIDADE:</label>TAUBATE
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">ESTADO:</label>SP
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">TOTAL BRUTO: </label>2.407,09
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">LIQ: </label>2.344,09
    </div>
</fieldset>

This code,

print browse.find_element_by_xpath("//div[@class='grid-3-12 form-no-lbl']").text

returns just the first element: 011234560083

How can I read values for each label? Like "LIQ:" = 2.344,09

Upvotes: 1

Views: 3427

Answers (3)

unutbu
unutbu

Reputation: 879073

If you have the luxury of having both Selenium and lxml available, you could use Selenium for navigating to the desired page(s), and then using lxml to parse the HTML. For example,

import lxml.html as LH
# content = browser.page_source
content = '''\
<fieldset>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CNPJ:</label>011234560083
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">CIDADE:</label>TAUBATE
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">ESTADO:</label>SP
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">TOTAL BRUTO: </label>2.407,09
    </div>
    <div class="grid-3-12 form-no-lbl">
            <label class="form-lbl">LIQ: </label>2.344,09
    </div>
</fieldset>'''

root = LH.fromstring(content)
labels = root.xpath('//fieldset/div[@class="grid-3-12 form-no-lbl"]/label')
data = [[item.strip() for item in [elt.text, elt.tail]] for elt in labels]

yields

[['CNPJ:', '011234560083'],
 ['CIDADE:', 'TAUBATE'],
 ['ESTADO:', 'SP'],
 ['TOTAL BRUTO:', '2.407,09'],
 ['LIQ:', '2.344,09']]

Upvotes: 1

JeffC
JeffC

Reputation: 25531

It seems really odd that your code doesn't work. I haven't run into a case quite like this. I think the code below should work. Basically I grab the text inside the LABEL and prepend it to the text you are already finding. The combination should get you the text you are looking for.

lines = browse.find_elements_by_css_selector("div.grid-3-12.form-no-lbl")
for line in lines
    print line.find_element_by_css_selector("label.form-lbl").text + line.text

Upvotes: 1

alecxe
alecxe

Reputation: 473753

It is a rather common problem in Selenium. Just because you cannot directly match the text nodes with find_element_by_* commands.

In your case, I assume you know the LIQ, ESTADO etc labels beforehand and need to get a value by the label.

The idea would be to locate a label by text, move up the tree to the parent, get the text, split by : and get the last element which would correspond to the desired value:

label = "ESTADO"
text = driver.find_element_by_xpath("//label[starts-with(., '%s:')]/.." % label).text
print(text.split(":")[-1].strip())

Upvotes: 0

Related Questions