Beginner
Beginner

Reputation: 2886

Converting XML file into CSV

I have an XML file which looks like this:

<Organism>
 <Name>Bacillus halodurans C-125</Name>
  <Enzyme>M.BhaII</Enzyme>
   <Motif>GGCC</Motif>
  <Enzyme>M1.BhaI</Enzyme>
   <Motif>GCATC</Motif>
  <Enzyme>M2.BhaI</Enzyme>
   <Motif>GCATC</Motif>
</Organism>
<Organism>
 <Name>Bacteroides eggerthii 1_2_48FAA</Name>
</Organism>

I'm trying to write it into a CSV file like this:

Bacillus halodurans, GGCC
Bacillus halodurans, GCATC
Bacillus halodurans, GCATC
Bacteriodes, 

The way I approached this is to create a list of tuples which will have the organism name and the motif together. I tried this using the ElementTree module:

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
rebase = tree.getroot()

list = []

for organisms in rebase.findall('Organism'):
        name = organisms.find('Name').text
        for each_organism in organisms.findall('Motif'):
            try:
                motif = organisms.find('Motif').text
                print name, motif
            except AttributeError:
                print name

However the output I get looks like this:

Bacillus halodurans, GGCC
Bacillus halodurans, GGCC
Bacillus halodurans, GGCC

Only the first motif gets recorded. This is my first time working with ElementTree so its slightly confusing. Any help will be greatly appreciated.

I don't need help with writing to a CSV file.

Upvotes: 1

Views: 282

Answers (1)

alecxe
alecxe

Reputation: 473803

The only thing you need to fix is to replace:

motif = organisms.find('Motif').text

with:

motif = each_organism.text

You are already iterating through the Motif nodes inside an Organism. each_organism loop variable is holding a value of a Motif tag.


I would also change the variable names to avoid confusion. Also, I don't see the need for try/except inside the loop over Motif tags. In case there can be name tag missing, you can follow the "Ask for forgiveness, not permission" approach and catch the error:

for organism in rebase.findall('Organism'):
    try:
        name = organism.find('Name').text
    except AttributeError:
        continue

    for motif in organism.findall('Motif'):
        motif = motif.text
        print name, motif

Upvotes: 2

Related Questions