Reputation: 5972
I want to extract parameter that I referred in the picture below...
What I have tried is:
url='http://site.ir'
content=requests.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/????')]
This is not in tag span and not in tag br .
The picture:
imagine I have:
out=""" <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT
<br></br>
</div>
</div> <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT(1)
<br></br>
</div>
</div>
imagine I have: out=""" <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT(2)
<br></br>
</div>
</div> <div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT(3)
<br></br>
</div>
</div> """"""
Upvotes: 2
Views: 1563
Reputation: 474151
Another option would be to get the following to span
text sibling:
//div[@class="grouptext"]/span[1]/following-sibling::text()
Demo:
from lxml import html
data = """
<div class="groupinfo">
<div class="grouptext">
<span style="color:#5f0101">
span tag contents
</span>
WHAT I WANT
<br></br>
</div>
</div>
"""
tree = html.fromstring(data)
print tree.xpath('//div[@class="grouptext"]/span[1]/following-sibling::text()')[0].strip()
Prints:
WHAT I WANT
For the updated example, here's what worked for me:
for result in tree.xpath('//div[@class="grouptext"]/span/following-sibling::text()'):
print result.strip()
Prints:
WHAT I WANT
WHAT I WANT(1)
WHAT I WANT(2)
WHAT I WANT(3)
Upvotes: 1
Reputation: 22647
Looks like that's the text content of a div
element. Unfortunately, "what you want" is unreadable because you scribbled "WHAT I WANT" on it.
What you are (most likely) looking for is a text node, that is not actually "between tags", it is a child of the div[@class="grouptext"]
element. There might be more than one such text node as a child of this div.
Try:
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]
Or
print tree.xpath('//div[@class="grouptext"]/text()')
might work as well, but I am not quite familiar with Python.
Upvotes: 0