Reputation: 503
I have a lot of XML-Files which aren't sorted. I'd like to sort them by a tag-content named "title". I know the order of titles: 1.) editorial 2.) content 3.) club etc. Each file has another "title"-content.
The XML-structure of file number 1 looks like this:
<article>
<someTags>
</someTags>
<title>editorial</title>
</article>
How can I go through all files and sort them by the defined order? The file-names can look like: "sorted_01.xml", "sorted_02.xml" and so on. How can I get this? Thanks a lot for any help! :)
Upvotes: 0
Views: 1579
Reputation: 11396
another way to do it (wasn't tested)
import glob, os
from lxml import etree
d = {}
titles = ['editorial', 'content' , 'club', ...] # ordered titles
for fname in glob.glob('*.xml'):
tree = etree.parse(fname)
title = tree.xpath('//title/text()')[0]
subtitle = tree.xpath('//subtitle/text()')[0]
key = '%s_%s' % (title, subtitle)
d[key] = fname
for idx,title in enumerate(titles, 1):
os.rename(d[title], 'sorted_%02d.xml' % idx)
Upvotes: 1
Reputation: 369074
Use sort
or sorted
with the function that return the text of the title
tag as a key function.
import glob
import xml.etree.ElementTree as ET
def get_title(filepath):
tree = ET.parse(filepath)
return tree.find('.//title').text
# Find `title` element and return the text of the element.
filepaths = glob.glob('*.xml')
print(sorted(filepaths, key=get_title))
See demo.
Upvotes: 1