Reputation:
I have this xml that i want to parse with python xml.etree.ElementTree
<draw:page draw:name="page3" draw:style-name="dp3" draw:master-page-name="Blue_5f_Curve1_5f_" presentation:presentation-page-layout-name="AL2T1" presentation:use-date-time-name="dtd1">
<office:forms form:automatic-focus="false" form:apply-design-mode="false"/>
<draw:frame presentation:style-name="pr4" draw:text-style-name="P1" draw:layer="layout" svg:width="26cm" svg:height="10cm" svg:x="1cm" svg:y="3cm" presentation:class="outline" presentation:user-transformed="true">
<draw:text-box>
<text:list text:style-name="L2">
<text:list-header>
<text:p>
<text:span text:style-name="T2">Sources</text:span>
</text:p>
<text:list>
<text:list-item>
<text:p>
<text:span text:style-name="T3">Medium.com</text:span>
</text:p>
</text:list-item>
<text:list-item>
<text:p>
<text:span text:style-name="T3">Livres pdf </text:span>
</text:p>
<text:list>
<text:list-item>
<text:p>
<text:span text:style-name="T3">
Docker -
<text:s/>
Concepts fondamentaux et déploiement d'applications distribuées edition ENI
</text:span>
</text:p>
</text:list-item>
<text:list-item>
<text:p>
<text:span text:style-name="T3">Docker - Prise en main et mise en pratique sur une architecture micro-services (JP. Gouigoux ENI)</text:span>
</text:p>
</text:list-item>
<text:list-item>
<text:p>
<text:span text:style-name="T3">Docker – Pratique des architectures à base de conteneurs Edition Dunod</text:span>
</text:p>
</text:list-item>
</text:list>
</text:list-item>
<text:list-item>
<text:p>
<text:span text:style-name="T3">Youtube</text:span>
</text:p>
</text:list-item>
<text:list-item>
<text:p>
<text:span text:style-name="T3">Stackoverflow</text:span>
</text:p>
</text:list-item>
<text:list-item>
<text:p>
<text:span text:style-name="T3">playwithdocker</text:span>
</text:p>
</text:list-item>
</text:list>
</text:list-header>
</text:list>
</draw:text-box>
</draw:frame>
<draw:frame presentation:style-name="pr3" draw:text-style-name="P1" draw:layer="layout" svg:width="26cm" svg:height="1.328cm" svg:x="1cm" svg:y="0.5cm" presentation:class="title" presentation:user-transformed="true">
<draw:text-box>
<text:p text:style-name="P4">
<text:span text:style-name="T4">Docker</text:span>
</text:p>
</draw:text-box>
</draw:frame>
<presentation:notes draw:style-name="dp2">
<draw:page-thumbnail draw:style-name="gr1" draw:layer="layout" svg:width="14.848cm" svg:height="11.136cm" svg:x="3.075cm" svg:y="2.257cm" draw:page-number="3" presentation:class="page"/>
<draw:frame presentation:style-name="pr5" draw:text-style-name="P5" draw:layer="layout" svg:width="16.799cm" svg:height="13.364cm" svg:x="2.1cm" svg:y="14.107cm" presentation:class="notes" presentation:placeholder="true" presentation:user-transformed="true">
<draw:text-box/>
</draw:frame>
</presentation:notes>
</draw:page>
Ultimately, i want to get the value of all text:p elements or their child if they exist such as <text:span ...>.
My python code is :
ostr = self.m_odf.read('content.xml')
doc = ET.fromstring(ostr)
self.pages = doc.findall("//*[@name='draw:page']")#'text:p')
I want to first get a list of the draw:page nodes and then search inside these nodes the 'text:p' elements.
My code returns an error "SyntaxError: cannot use absolute path on element".
Im not used to these tag:x syntax in xml so i dont find how to parse it with xpath ("//*[@name='draw:page']") doesnt seem to work.
Could you help me please?
Upvotes: 0
Views: 478
Reputation: 4834
In your xml there are a lot if missing namespace declarations, but surely they are on a high level in the xml-tree. If you are not able to use those namespace, you can use the local-name() function, to select elements based on there names without the namespace-prefix.
In the end you could try this XPath:
"//*[local-name()='page']//*[local-name()='p']//node()"
where the last part select all descendants (also text-nodes and element-nodes)
Upvotes: 0