Reputation: 13
sample XML File
<ArticleSet>
<Article>
<ForeName>a</ForeName>
<LastName>b</LastName>
<Affiliation>harvard university of science. [email protected]</Affiliation>
<Keywords>-</Keywords>
</Article>
<Article>
<ForeName>a</ForeName>
<LastName>b</LastName>
<Affiliation>-</Affiliation>
<Keywords>-</Keywords>
</Article>
<Article>
<ForeName>a</ForeName>
<LastName>b</LastName>
<Affiliation>harvard university of science. [email protected]</Affiliation>
<Keywords>-</Keywords>
</Article>
</ArticleSet>
SAMPLE CODE
from xml.etree import ElementTree as etree
import re
root = etree.parse("sampleinput.xml").getroot()
for article in root.iter("Affiliation"):
if(article.text != "-"):
email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
c = etree.Element("<Email>")
c.text = email.group(0)
etree.write(article,c)
OUTPUT REQUIRED UPDATED XML FILE
<?xml version="1.0"?>
<ArticleSet>
<Article>
<ForeName>a</ForeName>
<LastName>b</LastName>
<Affiliation>harvard university of science. [email protected]</Affiliation>
<Keywords>-</Keywords>
<Email>[email protected]</Email>
</Article>
<Article>
<ForeName>a</ForeName>
<LastName>b</LastName>
<Affiliation>-</Affiliation>
<Keywords>-</Keywords>
<Email>-</Email>
</Article>
<Article>
<ForeName>a</ForeName>
<LastName>b</LastName>
<Affiliation>harvard university of science. [email protected]</Affiliation>
<Keywords>-</Keywords>
<Email>[email protected]</Email>
</Article>
</ArticleSet>
I want to extract email address from <Affiliation>
tag and make a new tag named <Email>
and store extracted email into that tag. If <Affiliation>
is equal to -
then store <Email>-</Email>
into that article.
ERROR
Traceback (most recent call last): File "C:/Users/Ghost Rider/Documents/Python/addingTagsToXML.py", line 11, in etree.write(article,c) AttributeError: module 'xml.etree.ElementTree' has no attribute 'write'
Upvotes: 1
Views: 1091
Reputation: 16
You can use lxml instance xml library.This code is working fine
import re
from lxml import etree as et
# Open original file
tree = et.parse('t.xml')
for article in tree.iter("Affiliation"):
if(article.text != "-"):
email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
child = et.SubElement(article.getparent(), 'Email')
child.text = email.group(0)
else:
child = et.SubElement(article.getparent(), 'Email')
child.text = ' - '
# Write back to file
tree.write('t.xml')
Upvotes: 0
Reputation: 3186
You can try this :
import re
import xml
tree = xml.etree.ElementTree.parse('filename.xml')
e = tree.getroot()
for article in e.findall('Article'):
child = xml.etree.ElementTree.Element("Email")
if article[2].text != '-':
email = re.search(r'[\w\.-]+@[\w\.-]+', article[2].text).group()
child.text = email
else:
child.text = ' - '
article.insert(4,child)
tree.write("filename.xml")
Upvotes: 1
Reputation: 192
If you want to use the write
you should correct the etree import like this:
from xml.etree.ElementTree import ElementTree
And you shouldn't use etree
as an alias for ElementTree because it will overwtrite the etree
python builtin module!
Furthermore I think you misinterpret the meaning of the write
function, because it can only write the result tree to a file. If you want to modify an elemtree you should use something like append
, extend
etc. on your Element.
Upvotes: 0