MLSC
MLSC

Reputation: 5972

Extracting the value by xpath in python between tags

I want to extract parameter that I referred in the picture below...

What I have tried is:

url='http://site.ir'
content=requests.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/????')]

This is not in tag span and not in tag br .

The picture: enter image description here

UPDATE

imagine I have:

out=""" <div class="groupinfo">
    <div class="grouptext">
        <span style="color:#5f0101">
            span tag contents
        </span>
        WHAT I WANT
        <br></br>
    </div>
</div> <div class="groupinfo">
    <div class="grouptext">
        <span style="color:#5f0101">
            span tag contents
        </span>
        WHAT I WANT(1)
        <br></br>
    </div>
</div> 
imagine I have: out=""" <div class="groupinfo">
    <div class="grouptext">
        <span style="color:#5f0101">
            span tag contents
        </span>
        WHAT I WANT(2)
        <br></br>
    </div>
</div> <div class="groupinfo">
    <div class="grouptext">
        <span style="color:#5f0101">
            span tag contents
        </span>
        WHAT I WANT(3)
        <br></br>
    </div>
</div> """"""

Upvotes: 2

Views: 1563

Answers (2)

alecxe
alecxe

Reputation: 474151

Another option would be to get the following to span text sibling:

//div[@class="grouptext"]/span[1]/following-sibling::text()

Demo:

from lxml import html

data = """
<div class="groupinfo">
    <div class="grouptext">
        <span style="color:#5f0101">
            span tag contents
        </span>
        WHAT I WANT
        <br></br>
    </div>
</div>
"""

tree = html.fromstring(data)
print tree.xpath('//div[@class="grouptext"]/span[1]/following-sibling::text()')[0].strip()

Prints:

WHAT I WANT

For the updated example, here's what worked for me:

for result in tree.xpath('//div[@class="grouptext"]/span/following-sibling::text()'):
    print result.strip()

Prints:

WHAT I WANT

WHAT I WANT(1)

WHAT I WANT(2)

WHAT I WANT(3)

Upvotes: 1

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22647

Looks like that's the text content of a div element. Unfortunately, "what you want" is unreadable because you scribbled "WHAT I WANT" on it.

What you are (most likely) looking for is a text node, that is not actually "between tags", it is a child of the div[@class="grouptext"]element. There might be more than one such text node as a child of this div.

Try:

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]

Or

print tree.xpath('//div[@class="grouptext"]/text()')

might work as well, but I am not quite familiar with Python.

Upvotes: 0

Related Questions