user19413311
user19413311

Reputation:

XPath for XML: SyntaxError: cannot use absolute path on element

I have this xml that i want to parse with python xml.etree.ElementTree

<draw:page draw:name="page3" draw:style-name="dp3" draw:master-page-name="Blue_5f_Curve1_5f_" presentation:presentation-page-layout-name="AL2T1" presentation:use-date-time-name="dtd1">
                <office:forms form:automatic-focus="false" form:apply-design-mode="false"/>
                <draw:frame presentation:style-name="pr4" draw:text-style-name="P1" draw:layer="layout" svg:width="26cm" svg:height="10cm" svg:x="1cm" svg:y="3cm" presentation:class="outline" presentation:user-transformed="true">
                    <draw:text-box>
                        <text:list text:style-name="L2">
                            <text:list-header>
                                <text:p>
                                    <text:span text:style-name="T2">Sources</text:span>
                                </text:p>
                                <text:list>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Medium.com</text:span>
                                        </text:p>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Livres pdf </text:span>
                                        </text:p>
                                        <text:list>
                                            <text:list-item>
                                                <text:p>
                                                    <text:span text:style-name="T3">
                                                        Docker - 
                                                        <text:s/>
                                                        Concepts fondamentaux et déploiement d'applications distribuées edition ENI
                                                    </text:span>
                                                </text:p>
                                            </text:list-item>
                                            <text:list-item>
                                                <text:p>
                                                    <text:span text:style-name="T3">Docker - Prise en main et mise en pratique sur une architecture micro-services (JP. Gouigoux ENI)</text:span>
                                                </text:p>
                                            </text:list-item>
                                            <text:list-item>
                                                <text:p>
                                                    <text:span text:style-name="T3">Docker – Pratique des architectures à base de conteneurs Edition Dunod</text:span>
                                                </text:p>
                                            </text:list-item>
                                        </text:list>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Youtube</text:span>
                                        </text:p>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">Stackoverflow</text:span>
                                        </text:p>
                                    </text:list-item>
                                    <text:list-item>
                                        <text:p>
                                            <text:span text:style-name="T3">playwithdocker</text:span>
                                        </text:p>
                                    </text:list-item>
                                </text:list>
                            </text:list-header>
                        </text:list>
                    </draw:text-box>
                </draw:frame>
                <draw:frame presentation:style-name="pr3" draw:text-style-name="P1" draw:layer="layout" svg:width="26cm" svg:height="1.328cm" svg:x="1cm" svg:y="0.5cm" presentation:class="title" presentation:user-transformed="true">
                    <draw:text-box>
                        <text:p text:style-name="P4">
                            <text:span text:style-name="T4">Docker</text:span>
                        </text:p>
                    </draw:text-box>
                </draw:frame>
                <presentation:notes draw:style-name="dp2">
                    <draw:page-thumbnail draw:style-name="gr1" draw:layer="layout" svg:width="14.848cm" svg:height="11.136cm" svg:x="3.075cm" svg:y="2.257cm" draw:page-number="3" presentation:class="page"/>
                    <draw:frame presentation:style-name="pr5" draw:text-style-name="P5" draw:layer="layout" svg:width="16.799cm" svg:height="13.364cm" svg:x="2.1cm" svg:y="14.107cm" presentation:class="notes" presentation:placeholder="true" presentation:user-transformed="true">
                        <draw:text-box/>
                    </draw:frame>
                </presentation:notes>
            </draw:page>
            

Ultimately, i want to get the value of all text:p elements or their child if they exist such as <text:span ...>.

My python code is :

   ostr = self.m_odf.read('content.xml')
    doc = ET.fromstring(ostr)
    self.pages = doc.findall("//*[@name='draw:page']")#'text:p')

I want to first get a list of the draw:page nodes and then search inside these nodes the 'text:p' elements.

My code returns an error "SyntaxError: cannot use absolute path on element".

Im not used to these tag:x syntax in xml so i dont find how to parse it with xpath ("//*[@name='draw:page']") doesnt seem to work.

Could you help me please?

Upvotes: 0

Views: 478

Answers (1)

Siebe Jongebloed
Siebe Jongebloed

Reputation: 4834

In your xml there are a lot if missing namespace declarations, but surely they are on a high level in the xml-tree. If you are not able to use those namespace, you can use the local-name() function, to select elements based on there names without the namespace-prefix.

In the end you could try this XPath:

"//*[local-name()='page']//*[local-name()='p']//node()"

where the last part select all descendants (also text-nodes and element-nodes)

Upvotes: 0

Related Questions