shan455
shan455

Reputation: 71

How to split a single XML file into multiple based on tags

I have an XML file that have tags. I want to split files like this.

<?xml version="1.0" encoding="UTF-8"?>
<EMPRMART CREATION_DATE="08/20/2018 18:06:44" REPOSITORY_VERSION="187.96">
<REPOSITORY NAME="REP_DEV" VERSION="187" CODEPAGE="UTF-8" DATABASETYPE="Sybase">
<FOLDER NAME="MC_DEV" 
    <CONFIG DESCRIPTION ="Default ORDER configuration object" ISDEFAULT ="YES" NAME ="default_ORDER_config" VERSIONNUMBER ="1">
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </CONFIG>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Normal" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Medium" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
    <LOCATION DESCRIPTION ="" ISENABLED ="YES" 
    </LOCATION>
</FOLDER>
</REPOSITORY>
</EMPRMART>

Below is the code tried . But it is generating every single line into a new file

awk  '
    BEGIN { RS = "</ORDER>" } 
    $0 ~ /[^[:blank:]\n]/ { 
        printf "%s\n", $0 RS >> FILENAME "_" ++i ".xml" 
    }
' test.xml

I want to split this file based on ORDER tags alone as mentioned below

File1.xml
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Normal" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>        
File2.xml
    <ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Medium" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>
File3.xml
<ORDER DESCRIPTION ="" ISVALID ="YES" 
        <ATTRIBUTE NAME ="Advanced" VALUE =""/>
        <ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
    </ORDER>

Upvotes: 5

Views: 5836

Answers (3)

kvantour
kvantour

Reputation: 26551

To achieve what you request, I would not make use of awk, but rather a good XML-parser such as xmlstarlet or xmlint. There is a single unknown here, and that is the total amount of nodes with the name ORDER. We could write down an advanced XPath for the selection, but we will keep it simple:

xmlstarlet sel -t -v 'count(//ORDER)' file.xml

Now that you have the count, you can loop over all cases and write them to the files:

#!/usr/bin/env bash
xmlfile=file.xml

n=$(xmlstarlet sel -t -v 'count(//ORDER)' file.xml)
for i in $(seq 1 $n); do
   xmlstarlet sel -t -m "//ORDER[${i}]" -c . $xmlfile > "File${i}.xml"
done

Upvotes: 7

Jotne
Jotne

Reputation: 41460

If you do use gnu awk this should give your requested result.

awk '/<ORDER>/ {f=1;++a} f {print > "file_"a".xml"} /<\/ORDER>/ {f=0}' file

It will print only lines from <ORDER> to </ORDER> as a section in files called file_1.xml, file_2.xml etc.

Upvotes: 5

Ed Morton
Ed Morton

Reputation: 204488

With any awk in any shell on every UNIX box:

awk '/<ORDER/{f=1; out="file_"(++c)".xml"} f{print > out} /<\/ORDER>/{close(out); f=0}' file

it's obviously fragile as it's just doing regexp matches against text, not parsing the XML, but it'll work for the sample you posted and any similar text.

Upvotes: 3

Related Questions