Reputation: 2886
I have an XML
file which looks like this:
<Organism>
<Name>Bacillus halodurans C-125</Name>
<Enzyme>M.BhaII</Enzyme>
<Motif>GGCC</Motif>
<Enzyme>M1.BhaI</Enzyme>
<Motif>GCATC</Motif>
<Enzyme>M2.BhaI</Enzyme>
<Motif>GCATC</Motif>
</Organism>
<Organism>
<Name>Bacteroides eggerthii 1_2_48FAA</Name>
</Organism>
I'm trying to write it into a CSV
file like this:
Bacillus halodurans, GGCC
Bacillus halodurans, GCATC
Bacillus halodurans, GCATC
Bacteriodes,
The way I approached this is to create a list of tuples which will have the organism name
and the motif
together. I tried this using the ElementTree
module:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
rebase = tree.getroot()
list = []
for organisms in rebase.findall('Organism'):
name = organisms.find('Name').text
for each_organism in organisms.findall('Motif'):
try:
motif = organisms.find('Motif').text
print name, motif
except AttributeError:
print name
However the output I get looks like this:
Bacillus halodurans, GGCC
Bacillus halodurans, GGCC
Bacillus halodurans, GGCC
Only the first motif
gets recorded. This is my first time working with ElementTree
so its slightly confusing. Any help will be greatly appreciated.
I don't need help with writing to a CSV
file.
Upvotes: 1
Views: 282
Reputation: 473803
The only thing you need to fix is to replace:
motif = organisms.find('Motif').text
with:
motif = each_organism.text
You are already iterating through the Motif
nodes inside an Organism
. each_organism
loop variable is holding a value of a Motif
tag.
I would also change the variable names to avoid confusion. Also, I don't see the need for try/except
inside the loop over Motif
tags. In case there can be name
tag missing, you can follow the "Ask for forgiveness, not permission" approach and catch the error:
for organism in rebase.findall('Organism'):
try:
name = organism.find('Name').text
except AttributeError:
continue
for motif in organism.findall('Motif'):
motif = motif.text
print name, motif
Upvotes: 2