Michael
Michael

Reputation: 172

How to replace specific xml node value using sed or awk

I want to replace specific xml node value using sed or awk. I can't use specialized packages for parsing xml like xmlstarlet, xmllint etc. I have to use sed or awk, just "basic" shell.

I have many big xml files. In that file I want to target and replace two tags values: example:

<desc:partNumber>>2</desc:partNumber>
<desc:dateIssued>>1870</desc:dateIssued>

Problem is, there are hundreds tags with these names. But these two tags have parent tag that is unique within whole xml file:

<desc:desc ID="DESC_VOLUME_0001">

Another problem is that location or line numbers of tags <desc:partNumber> and <desc:dateIssued> which are inside parent <desc:desc ID="DESC_VOLUME_0001"> are different in every file.

I think the solution would be:

  1. Target and extract parent <desc:desc ID="DESC_VOLUME_0001"> and its children to variable
  2. Iterate children and get location(line number) of <desc:partNumber> and <desc:dateIssued> and save to variable
  3. Pass the line number to sed command and replace current value of that tag with new value(new value will be read from .csv file)

I tried create this sed command, you can see I used 'n' to move over lines, but this needs to be variable.

sed -i '/desc:desc ID="DESC_VOLUME_0001"/{n;n;n;n;n;n;n;n;n;s/'"${OLD_DATE_ISSUED}"'/'"${NEW_DATE_ISSUED}"'/}'

Parent node with children:

<desc:desc ID="DESC_VOLUME_0001"> 
    <desc:physicalDescription> 
        <desc:note>text</desc:note> 
    </desc:physicalDescription>  
    <desc:titleInfo> 
        <desc:partNumber>2</desc:partNumber> 
    </desc:titleInfo>  
    <desc:originInfo> 
        <desc:dateIssued>1870</desc:dateIssued> 
    </desc:originInfo>  
    <desc:identifier type="uuid">81e32d30-6388-11e6-8336-005056827e52</desc:identifier> 
</desc:desc> 

Can anybody help how to achieve this?

Upvotes: 0

Views: 546

Answers (1)

Raman Sailopal
Raman Sailopal

Reputation: 12867

With the example data in the file xmldata:

awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] 
  '$2 ~ /desc ID/ { 
                     split($2,arr,"\"");
                     descID=arr[2] 
                  } 
   $2 ~ /desc:partNumber/ { 
                            if (descID==dID) { 
                                               $0=gensub($3,part,$0) 
                                             } 
                          } 
   $2 ~ /desc:dateIssued/ { 
                            if (descID==dID) 
                                             { 
                                               $0=gensub($3,dissue,$0) 
                                             } 
                          }
   1' xmldata

One liner:

 awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] '$2 ~ /desc ID/ { split($2,arr,"\"");descID=arr[2] } $2 ~ /desc:partNumber/ { if (descID==dID) { $0=gensub($3,part,$0) } } $2 ~ /desc:dateIssued/ { if (descID==dID) { $0=gensub($3,dissue,$0) } }1' xmldata

Here we set the delimiters to < or > We also set dID to the desc ID we want to search for, part the partNumber we want to change to and dissue to the dateIssued we want to change.

We then search for the desc ID text in the line and split the line based on double quotes to get the second index of the array arr which is then used to create the variable descID.

We further search for partNumber and dateIssued, checking to see if dID=descID. If they match we replace the 3rd delimited field in the line $0 with the passed variables using the gensub function and set $0 to the result. We finally print the line (changed or otherwise) through 1.

Upvotes: 2

Related Questions