Reputation: 172
I want to replace specific xml node value using sed
or awk
. I can't use specialized packages for parsing xml like xmlstarlet
, xmllint
etc. I have to use sed
or awk
, just "basic" shell.
I have many big xml files. In that file I want to target and replace two tags values: example:
<desc:partNumber>>2</desc:partNumber>
<desc:dateIssued>>1870</desc:dateIssued>
Problem is, there are hundreds tags with these names. But these two tags have parent tag that is unique within whole xml file:
<desc:desc ID="DESC_VOLUME_0001">
Another problem is that location or line numbers of tags <desc:partNumber>
and <desc:dateIssued>
which are inside parent <desc:desc ID="DESC_VOLUME_0001">
are different in every file.
I think the solution would be:
<desc:desc ID="DESC_VOLUME_0001">
and its children to variable<desc:partNumber>
and <desc:dateIssued>
and save to variablesed
command and replace current value of
that tag with new value(new value will be read from .csv file)I tried create this sed
command, you can see I used 'n
' to move over lines, but this needs to be variable.
sed -i '/desc:desc ID="DESC_VOLUME_0001"/{n;n;n;n;n;n;n;n;n;s/'"${OLD_DATE_ISSUED}"'/'"${NEW_DATE_ISSUED}"'/}'
Parent node with children:
<desc:desc ID="DESC_VOLUME_0001">
<desc:physicalDescription>
<desc:note>text</desc:note>
</desc:physicalDescription>
<desc:titleInfo>
<desc:partNumber>2</desc:partNumber>
</desc:titleInfo>
<desc:originInfo>
<desc:dateIssued>1870</desc:dateIssued>
</desc:originInfo>
<desc:identifier type="uuid">81e32d30-6388-11e6-8336-005056827e52</desc:identifier>
</desc:desc>
Can anybody help how to achieve this?
Upvotes: 0
Views: 546
Reputation: 12867
With the example data in the file xmldata:
awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>]
'$2 ~ /desc ID/ {
split($2,arr,"\"");
descID=arr[2]
}
$2 ~ /desc:partNumber/ {
if (descID==dID) {
$0=gensub($3,part,$0)
}
}
$2 ~ /desc:dateIssued/ {
if (descID==dID)
{
$0=gensub($3,dissue,$0)
}
}
1' xmldata
One liner:
awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] '$2 ~ /desc ID/ { split($2,arr,"\"");descID=arr[2] } $2 ~ /desc:partNumber/ { if (descID==dID) { $0=gensub($3,part,$0) } } $2 ~ /desc:dateIssued/ { if (descID==dID) { $0=gensub($3,dissue,$0) } }1' xmldata
Here we set the delimiters to < or > We also set dID to the desc ID we want to search for, part the partNumber we want to change to and dissue to the dateIssued we want to change.
We then search for the desc ID text in the line and split the line based on double quotes to get the second index of the array arr which is then used to create the variable descID.
We further search for partNumber and dateIssued, checking to see if dID=descID. If they match we replace the 3rd delimited field in the line $0 with the passed variables using the gensub function and set $0 to the result. We finally print the line (changed or otherwise) through 1.
Upvotes: 2