Reputation: 79
I have a file that has lots of XML nodes:
<output>
<file name="user.java">
</file>
<file name="random.java">
<error line="52" column="3" severity="warning" message="User is not found." source="randomSource"/>
</file>
<output/>
Now I need to replace the source
in the error node with the name
attribute in the file and print it to a file. So the output file should have only
rows of error:
<error line="52" column="3" severity="warning" message="User is not found." name="customer.java"/>
preferably the name should be the first attribute:
<error name="random.java" line="52" column="3" severity="warning" message="User is not found." />
So the new file should only contain the error nodes and I can only use the default tools such as sed/awk/cut/etc...
I have only got as far as printing the error line but can't figure out how to do the above:
awk -vtag=file -vp=0 '{
if($0~("^<"tag)){p=1;next}
if($0~("^</"tag)){p=0;printf("\n");next}
if(p==1){$1=$1;printf("%s",$0)}
}' infile
Upvotes: 0
Views: 253
Reputation: 203254
Assuming your input really is structured as you show in your example (i.e. no newlines within <...>
s, and only 1 set of <...>
s per line, and all white space in each line is blank chars) then using any awk in any shell on every Unix box and using literal string operations with blanks as boundaries so it'll work even if any regexp or backreference metachars exist in the text or if any of the target strings are substrings of other strings:
$ cat tst.awk
{ tag=$0; gsub(/^ *< *| .*$/,"",tag) }
(tag == "file") && match($0,/ name="[^"]+"/) {
name = substr($0,RSTART+1,RLENGTH-1)
}
(tag == "error") && match($0,/ source="[^"]+"/) {
$0 = substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)
match($0,/ *< *[^ ]+ /)
$0 = substr($0,1,RLENGTH) name substr($0,RSTART+RLENGTH-1)
}
{ print }
$ awk -f tst.awk file
<output>
<file name="user.java">
</file>
<file name="random.java">
<error name="random.java" line="52" column="3" severity="warning" message="User is not found."/>
</file>
<output/>
or if you prefer to just replace the source= with name= in-situ:
$ cat tst.awk
{ tag=$0; gsub(/^ *< *| .*$/,"",tag) }
(tag == "file") && match($0,/ name="[^"]+"/) {
name = substr($0,RSTART+1,RLENGTH-1)
}
(tag == "error") && match($0,/ source="[^"]+"/) {
$0 = substr($0,1,RSTART) name substr($0,RSTART+RLENGTH)
}
{ print }
$ awk -f tst.awk file
<output>
<file name="user.java">
</file>
<file name="random.java">
<error line="52" column="3" severity="warning" message="User is not found." name="random.java"/>
</file>
<output/>
If you ONLY want the "error" line printed then in the above just change:
}
{ print }
to:
print
}
so the print
only happens inside the tag == "error"
block.
Upvotes: 1
Reputation: 2327
Try this simple awk
program:
level == 0 && $0 ~ "<" tag ".*>" {
print
level++
# get "name" attribute
gsub(/^.*name="/, "")
gsub(/".*$/, "")
name = $0
next
}
level == 1 && /<error.*>/ {
# remove "source" attribute
gsub(/ source="[^"]*"/, "")
# put "name" attribute at the beginning of "error" tag
gsub(/<error /, "<error name=\"" name "\" ")
print
next
}
level == 1 && $0 ~ "</" tag ">" {
print
level--
next
}
{
print
}
Called like this:
$ cat xmlerr.xml | awk -v tag="file" -f xmlerr.awk
<output>
<file name="user.java">
</file>
<file name="random.java">
<error name="random.java" line="52" column="3" severity="warning" message="User is not found."/>
</file>
</output>
Remove unnecessary print
commands
ALTERNATIVE
If you want tu suppress "name
" attribute in the open "file
" tag, the first bloc became:
level == 0 && $0 ~ "<" tag ".*>" {
name = $0
level++
n = gsub(/^.*name="/, "", name)
gsub(/".*$/, "", name)
# if substitution done, remove "name" attribute in the original line before printing
if (n > 0) {
gsub(/ name="[^"]*"/, "")
}
print
next
}
and the output:
<output>
<file>
</file>
<file>
<error name="random.java" line="52" column="3" severity="warning" message="User is not found."/>
</file>
</output>
Upvotes: 3
Reputation: 8711
Try this Perl solution:
$ cat stacky.txt
<output>
<file name="user.java">
</file>
<file name="random.java">
<error line="52" column="3" severity="warning" message="User is not found." source="randomSource"/>
</file>
<output/>
$ perl -ne ' /<file (name=\S+)>/ and $x=$1; if(/<error/) { s/(\<error)(.*)(\bsource="[^"]+")(.+)/$1 $x $2 $4/g ; print } ' stacky.txt
<error name="random.java" line="52" column="3" severity="warning" message="User is not found." />
Upvotes: 2