Praveen Kumar
Praveen Kumar

Reputation: 45

Extracting a filed from xml tag

In xml file I am searching the sting "<file:write" and xml file has a complete xml tag and with in that tag it has the value field. I am trying to fetch the value filed in csv file with file name. The problem is that the field (path= and Path=) is either 2 or 3 or 4th column and I am not able to use the cut command.

Is there a better way of doing this?

find /opt/mortagage/application.xml -type f -exec egrep -ri "<file:write" /dev/null {} + |uniq| sed '/<!--.*-->/d' | sed '/<!--/,/-->/d'

/opt/mortagage/application.xml:              <file:write doc:id="16630" path="${file.location}" doc:name="Save file to directory">
/opt/mortagage/application.xml:                      <file:write doc:name="Write to complete folder" doc:id="18890" path='#["${file.completeLocation}" ++ vars.zipFileName]' config-ref="File_Config_completed">
/opt/mortagage/application.xml:                      <file:write doc:name="Write to complete folder" doc:id="19990" Path='#["${file.completeLocation}" ++ vars.zipFileName]' config-ref="File_Config_completed">

Upvotes: -1

Views: 64

Answers (1)

pmf
pmf

Reputation: 36471

A "better way" would be to use dedicated processors for structured data, in this case a command-line XML processor could do it easily.

Using kislyuk/yq:

xq -r '.. | ."file:write"? | arrays[] // . | ."@path", ."@Path" | strings' in.xml

Using mikefarah/yq (which completely ignores namespaces):

yq -oy '.. | .write? | select(kind == "map") // .[] | ."+@path" // ."+@Path"' in.xml

Using xmlstarlet:

xmlstarlet sel -t -m '//file:write' -v '@path' -v '@Path' -n in.xml

Using libxml/xmllint:

  • xmllint requires to either declare the actual namespaces (which you haven't provided in the sample), or to defect to ignoring them all by resorting to a local-name() check
  • xmllint also doesn't support the string(…) function on multiple matches, so the best it can do is to output full attribute nodes like path="${file.location}". A workaround could be to subsequently use another tool (like awk or sed) to trim them down.
xmllint --xpath '//*[local-name()="write"]/@path | //*[local-name()="write"]/@Path' in.xml |
  sed 's/^.*\?="\|"$//g' # removes all up to the first =" and a final "

All of them output something like:

${file.location}
#["${file.completeLocation}" ++ vars.zipFileName]
#["${file.completeLocation}" ++ vars.zipFileName]

Upvotes: 0

Related Questions