Reputation: 26258
Using lxml in python I created this xpath syntax
htmlPage.xpath("/html/body//a/text()")
It gets me all <a>
-tags in certain html scopes I desire. Now I encountered that the <a>
-tags could look like this:
<a>This is a sentence with some <italic>italic text</italic>-formatting I want to parse.</a>
xpath returns me a list that has one element more then I expect. I checked that and recognized, that it splits the <a>
-tag mentioned above into two list elements, instead of one. Instead of the string
"This is a sentence with some italic text-formatting I want to parse."
I get the two strings
"This is a sentence with some" # and
"-formatting I want to parse."
Is there a way to correct that?
Upvotes: 0
Views: 680
Reputation: 26258
I solved my problem by first getting all <a>
-tags
results = htmlPage.xpath("/html/body//a")
and then iterating the returned list and using text_content()
on the list elements
for a_tag in results:
print a_tag.text_content() # prints bthe whol string: "This is a sentence with some italic text-formatting I want to parse."
Upvotes: 2