Reputation: 71
I have an XML file that have tags. I want to split files like this.
<?xml version="1.0" encoding="UTF-8"?>
<EMPRMART CREATION_DATE="08/20/2018 18:06:44" REPOSITORY_VERSION="187.96">
<REPOSITORY NAME="REP_DEV" VERSION="187" CODEPAGE="UTF-8" DATABASETYPE="Sybase">
<FOLDER NAME="MC_DEV"
<CONFIG DESCRIPTION ="Default ORDER configuration object" ISDEFAULT ="YES" NAME ="default_ORDER_config" VERSIONNUMBER ="1">
<ATTRIBUTE NAME ="Advanced" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</CONFIG>
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Normal" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Medium" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Advanced" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
<LOCATION DESCRIPTION ="" ISENABLED ="YES"
</LOCATION>
</FOLDER>
</REPOSITORY>
</EMPRMART>
Below is the code tried . But it is generating every single line into a new file
awk '
BEGIN { RS = "</ORDER>" }
$0 ~ /[^[:blank:]\n]/ {
printf "%s\n", $0 RS >> FILENAME "_" ++i ".xml"
}
' test.xml
I want to split this file based on ORDER tags alone as mentioned below
File1.xml
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Normal" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
File2.xml
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Medium" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
File3.xml
<ORDER DESCRIPTION ="" ISVALID ="YES"
<ATTRIBUTE NAME ="Advanced" VALUE =""/>
<ATTRIBUTE NAME ="Order type" VALUE ="NO"/>
</ORDER>
Upvotes: 5
Views: 5836
Reputation: 26551
To achieve what you request, I would not make use of awk, but rather a good XML-parser such as xmlstarlet or xmlint. There is a single unknown here, and that is the total amount of nodes with the name ORDER
. We could write down an advanced XPath for the selection, but we will keep it simple:
xmlstarlet sel -t -v 'count(//ORDER)' file.xml
Now that you have the count, you can loop over all cases and write them to the files:
#!/usr/bin/env bash
xmlfile=file.xml
n=$(xmlstarlet sel -t -v 'count(//ORDER)' file.xml)
for i in $(seq 1 $n); do
xmlstarlet sel -t -m "//ORDER[${i}]" -c . $xmlfile > "File${i}.xml"
done
Upvotes: 7
Reputation: 41460
If you do use gnu awk
this should give your requested result.
awk '/<ORDER>/ {f=1;++a} f {print > "file_"a".xml"} /<\/ORDER>/ {f=0}' file
It will print only lines from <ORDER>
to </ORDER>
as a section in files called file_1.xml
, file_2.xml
etc.
Upvotes: 5
Reputation: 204488
With any awk in any shell on every UNIX box:
awk '/<ORDER/{f=1; out="file_"(++c)".xml"} f{print > out} /<\/ORDER>/{close(out); f=0}' file
it's obviously fragile as it's just doing regexp matches against text, not parsing the XML, but it'll work for the sample you posted and any similar text.
Upvotes: 3