Yugam Uppal
Yugam Uppal

Reputation: 13

How to add an xml tag based on specific condition using python

sample XML File

<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
    </Article>
</ArticleSet>

SAMPLE CODE

from xml.etree import ElementTree as etree
import re

root = etree.parse("sampleinput.xml").getroot()

for article in root.iter("Affiliation"):
    if(article.text != "-"):
        email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
        c = etree.Element("<Email>")
        c.text = email.group(0)
        etree.write(article,c)

OUTPUT REQUIRED UPDATED XML FILE

<?xml version="1.0"?>
<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
        <Email>[email protected]</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
        <Email>-</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
        <Email>[email protected]</Email>
    </Article>
</ArticleSet>

I want to extract email address from <Affiliation> tag and make a new tag named <Email> and store extracted email into that tag. If <Affiliation> is equal to - then store <Email>-</Email> into that article.

ERROR

Traceback (most recent call last): File "C:/Users/Ghost Rider/Documents/Python/addingTagsToXML.py", line 11, in etree.write(article,c) AttributeError: module 'xml.etree.ElementTree' has no attribute 'write'

Upvotes: 1

Views: 1091

Answers (3)

MD.Moniruzzaman
MD.Moniruzzaman

Reputation: 16

You can use lxml instance xml library.This code is working fine

import re
from lxml import etree as et
# Open original file
tree = et.parse('t.xml')
for article in tree.iter("Affiliation"):
    if(article.text != "-"):
        email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
        child = et.SubElement(article.getparent(), 'Email')
        child.text = email.group(0)
    else:
        child = et.SubElement(article.getparent(), 'Email')
        child.text = ' - '

# Write back to file
tree.write('t.xml')

Upvotes: 0

Vikas Periyadath
Vikas Periyadath

Reputation: 3186

You can try this :

import re
import xml
tree = xml.etree.ElementTree.parse('filename.xml')
e = tree.getroot()

for article in e.findall('Article'):
    child = xml.etree.ElementTree.Element("Email")
    if article[2].text != '-':
        email = re.search(r'[\w\.-]+@[\w\.-]+', article[2].text).group()
        child.text = email
    else:
        child.text = ' - '
    article.insert(4,child)
tree.write("filename.xml")

Upvotes: 1

Attila Bogn&#225;r
Attila Bogn&#225;r

Reputation: 192

If you want to use the write you should correct the etree import like this:

from xml.etree.ElementTree import ElementTree

And you shouldn't use etree as an alias for ElementTree because it will overwtrite the etree python builtin module!

Furthermore I think you misinterpret the meaning of the write function, because it can only write the result tree to a file. If you want to modify an elemtree you should use something like append, extend etc. on your Element.

Upvotes: 0

Related Questions