Reputation:
I have an XML file with multiple elements. I'd like to extract specific attributes for each package element: codepath, name and nativelibarypath.
The system is very basic and has a limited basic linux terminal with bash, awk, grep etc. No extra packages such as xmllint etc are available. So all we have to work with is probably bash, awk, sed and grep.
I'd like in the script, to assign attribute values to named shell variables so I can use them in creating an output file, which is to look like:-
[for each <package> element processed]
..
name:<from name attribute>
path:<from nativelibrarypath attribute>
apk:<from codepath attribute>
...
The XML source is:
<package codepath="/data/app/com.project.t2i-2.apk" flags="0" ft="13a837c2068" it="13a83704ea3" name="com.project.t2i" nativelibrarypath="/data/data/com.project.t2i/lib" userid="10040" ut="13a837c2ecb" version="1">
<sigs count="1">
<cert index="3" key="308201e53082014ea0030201020204506825ae300d06092a86
4886f70d01010505003037310b30090603550406130255533110300e060355040a13074
16e64726f6964311630140603550403130d416e64726f6964204465627567301e170d31
32303933303130353735305a170d3432303932333130353735305a3037310b300906035
50406130255533110300e060355040a1307416e64726f6964311630140603550403130d
416e64726f696420446562756730819f300d06092a864886f70d010101050003818d003
08189028181009ce1c5fd64db794fd787887e8a2dccf6798ddd2fd6e1d8ab04cd8cdd9e
bf721fb3ed6be1d67c55ce729b1e1d32b200cbcfc91c798ef056bc9b2cbc66a396aed6b
a3629a18e4839353314252811412202500f11a11c3bf4eb41b2a8747c3c791c89391443
39036345b15b5e080469ac5f536fd9edffcd52dcbdf88cf43c580abd0203010001300d0
6092a864886f70d01010505000381810071fa013b4560f16640ed261262f32085a51fca
63fa6c5c46fde9a862b56b6d6f17dd49643086a39a06314426ba9a38b784601197246f8
d568e349a93bc6af315455de7a8923f40d4051a51e1658ee34aca41494ab94ce978ae38
609803dfb3004806634e6e78dd0be26fe75843958711935ffc85f9fcf81523ce23c86bc
c5c7a">
</cert></sigs>
<perms>
<item name="android.permission.WRITE_EXTERNAL_STORAGE">
</item></perms>
</package>
Appreciate the purists will balk at this , however with limited toolsets I'm afraid bash/awk is the only viable way. Accept that XML poorly formatted may not be parsed. But as it stands, all elements include the set of attributes always in the same order as above.
I tried this, but it is hopelessly poor...
awk -F '"' '/<package.*?((codepath=)|(name=))+/{print $2}' packages.xml
Upvotes: 1
Views: 1007
Reputation: 203664
Without showing us the expected output and without input containing multiple packages it's a guess if this is what you want or not but in any case - with any POSIX awk:
$ cat tst.awk
BEGIN {
OFS=":"
map["nativelibrarypath"] = "path"
map["codepath"] = "apk"
tags[++numTags] = "name"
tags[++numTags] = "path"
tags[++numTags] = "apk"
}
$1 == "<package" { inPkg=1 }
$1 == "</package>" { prtPkg(); inPkg=0 }
inPkg {
for (i=1; i<=NF; i++) {
if ( match($i,/^[[:alnum:]_]+=/) ) {
tag = substr($i,RSTART,RLENGTH-1)
tag = (tag in map ? map[tag] : tag)
val = substr($i,RSTART+RLENGTH)
gsub(/^"|">?$/,"",val)
tag2val[tag] = val
}
}
}
END { prtPkg() }
function prtPkg( tag, tagNr) {
if ("name" in tag2val) {
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
print tag, tag2val[tag]
}
}
delete tag2val
}
.
$ awk -f tst.awk file
name:android.permission.WRITE_EXTERNAL_STORAGE
path:/data/data/com.project.t2i/lib
apk:/data/app/com.project.t2i-2.apk
Note that your input has 2 name
attributes and you didn't say which one you wanted output. Also your key
is multi-line and there's ways to handle that but since you don't want that output I just saved the first part of it from its first line when populating the tag2val
array.
Upvotes: 1