Reputation: 2106
Here is the xml file http://www.diveintopython3.net/examples/feed.xml
My questions are
how to remove the \n
and the following white space in the text
how to get the node whose text is "dive into mark", how to search the text syntax
Upvotes: 3
Views: 4256
Reputation: 180391
Just call normalize-space(.)
on each node.
import lxml.etree as et
xml = et.parse("feed.xml")
ns = {"ns": 'http://www.w3.org/2005/Atom'}
for n in xml.xpath("//ns:category", namespaces=ns):
t = n.xpath("./../ns:summary", namespaces=ns)[0]
print(t.xpath("normalize-space(.)"))
Output:
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
The accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
All your newlines have been removed and multiple spaces replaced with a single space.
Part two of your question is asking for the title tag as that is the only tag with the text you are looking for, but to specifically find the title with that exact text, that is simply:
xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)
If you wanted any node that contained the text, you would just replace ns:title with a wildcard:
xml.xpath("//*[text()='dive into mark']", namespaces=ns)
Upvotes: 2