ljf
ljf

Reputation: 65

LXML: Get header/top-level comment

Using the LXML library preferably, is there a way to access a comment at the very top of an XML structure once it has been parsed. I would like to avoid parsing the plain text "myself".

This is an example making my interest in it very obvious, I guess :)

<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "https://www.kegg.jp/kegg/xml/KGML_v0.7.2_.dtd">
<!-- Creation date: Jan 11, 2019 11:48:16 +0900 (GMT+9) -->

So I am hoping for a function that can return the comment in the last line. I am happy about any other ideas on how to handle this nicely, too, of course.

Upvotes: 2

Views: 484

Answers (1)

har07
har07

Reputation: 89295

You can use XPath comment() to get a comment node. To be more specific, you can get just the first comment in the document node by using /comment()[1] query. Here is a self contained example:

>>> raw = '''<?xml version="1.0"?>
... <!DOCTYPE pathway SYSTEM "https://www.kegg.jp/kegg/xml/KGML_v0.7.2_.dtd">
... <!-- Creation date: Jan 11, 2019 11:48:16 +0900 (GMT+9) -->
... <root>
... <child>content</child>
... <!-- Comment 2 -->
... </root>
... <!-- Comment 3 -->'''
>>> from lxml import etree as et
>>> root = et.fromstring(raw)
>>> first_comment = root.xpath("/comment()[1]")
>>> print(first_comment)
[<!-- Creation date: Jan 11, 2019 11:48:16 +0900 (GMT+9) -->]

Upvotes: 3

Related Questions