Reputation: 55
I have an XML file containing some entries characterized by specific kaywords. I need to run a for loop on the entries, extract two different keywords for each for them to be used in the for loop as variables.
Here is an example of list.xml:
<?xml version="1.0" encoding="UTF-8"?>
<responses type="C-FIND">
<data-set xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
<element tag="0008,0005" vr="CS" vm="1" len="10" name="SpecificCharacterSet">ISO_IR 192</element>
<element tag="0008,0052" vr="CS" vm="1" len="6" name="QueryRetrieveLevel">STUDY</element>
<element tag="0008,0054" vr="AE" vm="1" len="8" name="RetrieveAETitle">PLATONE</element>
<element tag="0010,0010" vr="PN" vm="1" len="16" name="PatientName">Anon^1600373003</element>
<element tag="0020,000d" vr="UI" vm="1" len="42" name="StudyInstanceUID">1.3.76.13.99972.2.20181217085753.1484038.1</element>
</data-set>
<data-set xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
<element tag="0008,0005" vr="CS" vm="1" len="10" name="SpecificCharacterSet">ISO_IR 192</element>
<element tag="0008,0052" vr="CS" vm="1" len="6" name="QueryRetrieveLevel">STUDY</element>
<element tag="0008,0054" vr="AE" vm="1" len="8" name="RetrieveAETitle">PLATONE</element>
<element tag="0010,0010" vr="PN" vm="1" len="16" name="PatientName">Anon^1599844862</element>
<element tag="0020,000d" vr="UI" vm="1" len="42" name="StudyInstanceUID">1.3.76.13.99972.2.20180925142630.1456727.1</element>
</data-set>
</responses>
I need to extract the keywords "PatientName" and "StudyInstanceUID". I tried to use something like this:
grep -A2 -i "PatientName" list.xml | while read -r string ; do
PatientName="$(echo $string | grep -i "PatientName" | cut -d ">" -f 2 | cut -d "<" -f 1)"
StudyInstanceUID="$(echo $string | grep -i "StudyInstanceUID" | cut -d ">" -f 2 | cut -d "<" -f 1)"
echo "$PatientName"
echo "$StudyInstanceUID"
done
The problem is that I obtain a lot of empty rows! What's the problem?
[EDIT] What I would like to obtain from this example is something like this:
Anon^1600373003
1.3.76.13.99972.2.20181217085753.1484038.1
Anon^1599844862
1.3.76.13.99972.2.20180925142630.1456727.1
Thanks so much.
Ivan
Upvotes: 2
Views: 1858
Reputation: 3423
awk
and sed
are not designed to process XML. Please use a dedicated tool instead. I can recommend xidel.
Stdout:
$ xidel -s list.xml -e '
//data-set/(
element[@name="PatientName"],
element[@name="StudyInstanceUID"]
)
'
Anon^1600373003
1.3.76.13.99972.2.20181217085753.1484038.1
Anon^1599844862
1.3.76.13.99972.2.20180925142630.1456727.1
Variables:
$ xidel -s list.xml -e '
//data-set/(
eval(x"{concat("pn",position())}:=element[@name=""PatientName""]")[0],
eval(x"{concat("si",position())}:=element[@name=""StudyInstanceUID""]")[0]
)
'
pn1 := Anon^1600373003
si1 := 1.3.76.13.99972.2.20181217085753.1484038.1
pn2 := Anon^1599844862
si2 := 1.3.76.13.99972.2.20180925142630.1456727.1
These are internal variables that are just printed to stdout. Use --output-format=bash
and Bash's built-in eval
command to convert them to shell variables.
$ eval $(xidel -s list.xml -e '
//data-set/(
eval(x"{concat("pn",position())}:=element[@name=""PatientName""]")[0],
eval(x"{concat("si",position())}:=element[@name=""StudyInstanceUID""]")[0]
)
' --output-format=bash)
$ printf '%s\n' $pn1 $si1 $pn2 $si2
Anon^1600373003
1.3.76.13.99972.2.20181217085753.1484038.1
Anon^1599844862
1.3.76.13.99972.2.20180925142630.1456727.1
Upvotes: 0
Reputation: 34334
As Raman alluded to in the comment, using a XML-aware tool to parse XML data is probably your best bet especially if some of your XML may not be as formatted as displayed in the question (eg, everything on one long line).
Assumptions:
PatientName
and StudyInstanceUID
do not show up in larger strings (eg, LastPatientName
or PreviousStudyInstanceUID
)PatientName
element is always listed before the StudyInstanceUID
elementOne awk
solution which eliminates the need for all of the sub-process calls to echo
, grep
and cut
:
awk -F'[<>]' ' # define input field separators as "<" and ">"
/PatientName/ || /StudyInstanceUID/ { print $3 } # if we find one of our search strings then print field #3
' list.xml
The same as a one-liner, sans comments:
awk -F'[<>]' '/PatientName/ || /StudyInstanceUID/ { print $3 }' list.xml
The above generates:
Anon^1600373003
1.3.76.13.99972.2.20181217085753.1484038.1
Anon^1599844862
1.3.76.13.99972.2.20180925142630.1456727.1
As for capturing the output into variables (eg, within a while
loop), we can make some small changes, eg:
awk -F'[<>]' '
/PatientName/ { pn=$3 } # store field #3 in variable "pn"
/StudyInstanceUID/ { printf "%s %s\n", pn, $3 } # print data to stdout
' list.xml
This will generate:
Anon^1600373003 1.3.76.13.99972.2.20181217085753.1484038.1
Anon^1599844862 1.3.76.13.99972.2.20180925142630.1456727.1
Feeding this into a while
loop:
while read -r PatientName StudyInstanceUID
do
echo "+++++++++++++++++++"
echo "PatientName: ${PatientName}"
echo "StudyInstanceUID: ${StudyInstanceUID}"
done < <(awk -F'[<>]' ' /PatientName/ { pn=$3 } /StudyInstanceUID/ { printf "%s %s\n", pn, $3 } ' list.xml)
And this generates:
+++++++++++++++++++
PatientName: Anon^1600373003
StudyInstanceUID: 1.3.76.13.99972.2.20181217085753.1484038.1
+++++++++++++++++++
PatientName: Anon^1599844862
StudyInstanceUID: 1.3.76.13.99972.2.20180925142630.1456727.1
Upvotes: 1
Reputation: 2603
Command:
grep -A2 -i "PatientName" list.xml
returns multiple lines:
<element tag="0010,0010" vr="PN" vm="1" len="16" name="PatientName">Anon^1600373003</element>
<element tag="0020,000d" vr="UI" vm="1" len="42" name="StudyInstanceUID">1.3.76.13.99972.2.20181217085753.1484038.1</element>
</data-set>
--
<element tag="0010,0010" vr="PN" vm="1" len="16" name="PatientName">Anon^1599844862</element>
<element tag="0020,000d" vr="UI" vm="1" len="42" name="StudyInstanceUID">1.3.76.13.99972.2.20180925142630.1456727.1</element>
</data-set>
so your while
, read this output line by line. The result you get is correct because on line:
<element tag="0010,0010" vr="PN" vm="1" len="16" name="PatientName">Anon^1600373003</element>
StudyInstanceUID
is not present and your variable will be empty.
In order to get the desired result, try this:
grep -A1 -i "PatientName" list.xml | while read -r string ; do
PatientName="$(echo $string | grep -i "PatientName" | cut -d ">" -f 2 | cut -d "<" -f 1)"
read string
StudyInstanceUID="$(echo $string | grep -i "StudyInstanceUID" | cut -d ">" -f 2 | cut -d "<" -f 1)"
echo "$PatientName"
echo "$StudyInstanceUID"
read string
done
Using read string
you will get next line, but becareful, this works if lines are in that order.
Upvotes: 0