MarkF6
MarkF6

Reputation: 503

Sort XML-File by Tag-content

I have a lot of XML-Files which aren't sorted. I'd like to sort them by a tag-content named "title". I know the order of titles: 1.) editorial 2.) content 3.) club etc. Each file has another "title"-content.

The XML-structure of file number 1 looks like this:

<article>
<someTags>
</someTags>
<title>editorial</title>
</article>

How can I go through all files and sort them by the defined order? The file-names can look like: "sorted_01.xml", "sorted_02.xml" and so on. How can I get this? Thanks a lot for any help! :)

Upvotes: 0

Views: 1579

Answers (2)

Guy Gavriely
Guy Gavriely

Reputation: 11396

another way to do it (wasn't tested)

import glob, os
from lxml import etree
d = {}
titles = ['editorial', 'content' , 'club', ...] # ordered titles
for fname in glob.glob('*.xml'):
     tree = etree.parse(fname)
     title = tree.xpath('//title/text()')[0]
     subtitle = tree.xpath('//subtitle/text()')[0]
     key = '%s_%s' % (title, subtitle)
     d[key] = fname
for idx,title in enumerate(titles, 1):
     os.rename(d[title], 'sorted_%02d.xml' % idx)

Upvotes: 1

falsetru
falsetru

Reputation: 369074

Use sort or sorted with the function that return the text of the title tag as a key function.

import glob
import xml.etree.ElementTree as ET

def get_title(filepath):
    tree = ET.parse(filepath)
    return tree.find('.//title').text
    # Find `title` element and return the text of the element.

filepaths = glob.glob('*.xml')
print(sorted(filepaths, key=get_title))

See demo.

Upvotes: 1

Related Questions