Alex
Alex

Reputation: 29

Is there a way to get xml element?

I have the following XML I am parsing through to get some values and make store them in a file, I am able to get attributes however when I try to pull the value I need I get none when using x.get() like in the example, could you help me understand what's happening? and how can I get the values I need?

This is an example of the xml:

<?xml version='1.0' encoding='UTF-8'?>
<sbml xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" level=3 sboTerm="SBO:0000624" version="1" xmlns=""http://www.sbml.org/sbml/level3/version1/core" fbc:required="false">
   <model id="iJR904" fbc:strict="true">
      <listOfUnitDefinitions> fobo </listOfUnitDefinitions>
      <fbc:listOfObjectives fbc:activeObjective="obj> Objectives </fbc:listOfObjectives>
      <listOfParameters>
         <parameter constant="true" id="cobra_default_lb" sboTerm="SBO:0000626" units="mmol_per_gDW_per_hr" value="-999999" />
         <parameter constant="true" id="cobra_default_ub" sboTerm="SBO:0000626" units="mmol_per_gDW_per_hr" value="999999" />
         <parameter constant="true" id="cobra_0_bound" sboTerm="SBO:0000626" units="mmol_per_gDW_per_hr" value="0" />
         <parameter constant="true" id="R_ATPM_upper_bound" sboTerm="SBO:0000625" units="mmol_per_gDW_per_hr" value="7.6" />
         <parameter constant="true" id="R_ATPM_lower_bound" sboTerm="SBO:0000625" units="mmol_per_gDW_per_hr" value="7.6" />
         <parameter constant="true" id="R_EX_glc_DASH_D_e_lower_bound" sboTerm="SBO:0000625" units="mmol_per_gDW_per_hr" value="-10" />
         <parameter constant="true" id="R_EX_o2_e_lower_bound" sboTerm="SBO:0000625" units="mmol_per_gDW_per_hr" value="-20" />
       </listOfParameters>
       <listOfCompartments> Compartments</listOfCompartments>
       <listOfSpecies> Species</listOfSpecies>
       <fbc:listOfGeneProducts> Products </fbc:listOfGeneProducts>
       <listOfReactions>
         <reaction fast="false" id="R_12PPDt" name="S-Propane-1,2-diol facilitated transport" reversible="true" fbc:lowerFluxBound="cobra_default_lb" fbc:upperFluxBound="cobra_default_ub">
         </reaction>
         <reaction fast="false" id="R_2DGLCNRx" name="2-dehydro-D-gluconate reductase (NADH)" reversible="false" fbc:lowerFluxBound="cobra_0_bound" fbc:upperFluxBound="cobra_default_ub">
         </reaction>
       </listOfReactions>
   </model>
</sbml>

The code I wrote in Python is like this

import xml.etree.ElementTree as ET
mytree =  ET.parse('iJR904.xml')
myroot = mytree.getroot()

mychild = myroot[0]

for a in range(len(mychild[6])):
       print(mychild[6][a].get('id'),':',mychild[6][a].get('fbc:lowerFluxBound'))

It prints:

R_12PPDt : none
R_2DGLCNRx : none

Since I want value not really the text I thought I could use some if like this:

for a in range(len(mychild[6])):
    if mychild[6][a].get('fbc:lowerFluxBound') == 'cobra_default_lb':
        print(mychild[6][a].get('id'),':',-999999)

    elif mychild[6][a].get('fbc:lowerFluxBound') == 'cobra_0_bound':
        print(mychild[6][a].get('id'),':',0)

Upvotes: 0

Views: 88

Answers (1)

dabingsou
dabingsou

Reputation: 2469

You're a good researcher and can figure out what's wrong. This should be caused by the namespace in the XML syntax. The following library ignores the syntax of the namespace and treats it as a normal string.

from simplified_scrapy import SimplifiedDoc, utils

xml = utils.getFileContent('iJR904.xml')
doc = SimplifiedDoc(xml)
listOfReactions = doc.selects('listOfReactions>reaction')
# print (listOfReactions)

print ([(a.id,a['fbc:lowerFluxBound']) for a in listOfReactions])

Result:

[('R_12PPDt', 'cobra_default_lb'), ('R_2DGLCNRx', 'cobra_0_bound')]

Upvotes: 1

Related Questions