Stacky
Stacky

Reputation: 79

How to replace one property for another in XML using Sed and Awk

I have a file that has lots of XML nodes:

<output>
<file name="user.java">
</file>

<file name="random.java">
<error line="52" column="3" severity="warning" message="User is not found." source="randomSource"/>
</file>
<output/>

Now I need to replace the source in the error node with the name attribute in the file and print it to a file. So the output file should have only rows of error:

<error line="52" column="3" severity="warning" message="User is not found." name="customer.java"/>

preferably the name should be the first attribute:

<error name="random.java" line="52" column="3" severity="warning" message="User is not found." />

So the new file should only contain the error nodes and I can only use the default tools such as sed/awk/cut/etc...

I have only got as far as printing the error line but can't figure out how to do the above:

awk -vtag=file -vp=0 '{
if($0~("^<"tag)){p=1;next}
if($0~("^</"tag)){p=0;printf("\n");next}
if(p==1){$1=$1;printf("%s",$0)}
}' infile 

Upvotes: 0

Views: 253

Answers (3)

Ed Morton
Ed Morton

Reputation: 203254

Assuming your input really is structured as you show in your example (i.e. no newlines within <...>s, and only 1 set of <...>s per line, and all white space in each line is blank chars) then using any awk in any shell on every Unix box and using literal string operations with blanks as boundaries so it'll work even if any regexp or backreference metachars exist in the text or if any of the target strings are substrings of other strings:

$ cat tst.awk
{ tag=$0; gsub(/^ *< *| .*$/,"",tag) }

(tag == "file") && match($0,/ name="[^"]+"/) {
    name = substr($0,RSTART+1,RLENGTH-1)
}

(tag == "error") && match($0,/ source="[^"]+"/) {
    $0 = substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)
    match($0,/ *< *[^ ]+ /)
    $0 = substr($0,1,RLENGTH) name substr($0,RSTART+RLENGTH-1)
}

{ print }

$ awk -f tst.awk file
<output>
<file name="user.java">
</file>

<file name="random.java">
<error name="random.java" line="52" column="3" severity="warning" message="User is not found."/>
</file>
<output/>

or if you prefer to just replace the source= with name= in-situ:

$ cat tst.awk
{ tag=$0; gsub(/^ *< *| .*$/,"",tag) }

(tag == "file") && match($0,/ name="[^"]+"/) {
    name = substr($0,RSTART+1,RLENGTH-1)
}

(tag == "error") && match($0,/ source="[^"]+"/) {
    $0 = substr($0,1,RSTART) name substr($0,RSTART+RLENGTH)
}

{ print }

$ awk -f tst.awk file
<output>
<file name="user.java">
</file>

<file name="random.java">
<error line="52" column="3" severity="warning" message="User is not found." name="random.java"/>
</file>
<output/>

If you ONLY want the "error" line printed then in the above just change:

}

{ print }

to:

    print
}

so the print only happens inside the tag == "error" block.

Upvotes: 1

Arnaud Valmary
Arnaud Valmary

Reputation: 2327

Try this simple awk program:

level == 0 && $0 ~ "<" tag ".*>" {
    print
    level++
    # get "name" attribute
    gsub(/^.*name="/, "")
    gsub(/".*$/, "")
    name = $0
    next
}
level == 1 && /<error.*>/ {
    # remove "source" attribute
    gsub(/ source="[^"]*"/, "")
    # put "name" attribute at the beginning of "error" tag
    gsub(/<error /, "<error name=\"" name "\" ")
    print
    next
}
level == 1 && $0 ~ "</" tag ">" {
    print
    level--
    next
}
{
    print
}

Called like this:

$ cat xmlerr.xml | awk -v tag="file" -f xmlerr.awk 
<output>
    <file name="user.java">
    </file>
    
    <file name="random.java">
    <error name="random.java" line="52" column="3" severity="warning" message="User is not found."/>
    </file>
</output>

Remove unnecessary print commands

ALTERNATIVE

If you want tu suppress "name" attribute in the open "file" tag, the first bloc became:

level == 0 && $0 ~ "<" tag ".*>" {
    name = $0
    level++
    n = gsub(/^.*name="/, "", name)
    gsub(/".*$/, "", name)
    # if substitution done, remove "name" attribute in the original line before printing
    if (n > 0) {
        gsub(/ name="[^"]*"/, "")
    }
    print
    next
}

and the output:

<output>
    <file>
    </file>
    
    <file>
    <error name="random.java" line="52" column="3" severity="warning" message="User is not found."/>
    </file>
</output>

Upvotes: 3

stack0114106
stack0114106

Reputation: 8711

Try this Perl solution:

$ cat stacky.txt
<output>
<file name="user.java">
</file>

<file name="random.java">
<error line="52" column="3" severity="warning" message="User is not found." source="randomSource"/>
</file>
<output/>
   
$ perl -ne  ' /<file (name=\S+)>/ and $x=$1; if(/<error/) { s/(\<error)(.*)(\bsource="[^"]+")(.+)/$1 $x $2 $4/g  ; print }  ' stacky.txt
<error name="random.java"  line="52" column="3" severity="warning" message="User is not found."  />

Upvotes: 2

Related Questions