Chan Kim
Chan Kim

Reputation: 5919

How can I escape special characters that is entered in a function which contains awk command?

To make this problem simple to demonstrate, I made a fake xml file like this.

<abc>
      <spirit:addressBlock>
        <spirit:name>cmn700_registers</spirit:name>
          <def>
          </def>
      </spirit:addressBlock>
</abc>

And I want to print lines containing pattern <spirit:name> inside a block of lines, the block begining with the pattern <spirit:addressBlock> and ending with </spirit:addressBlock>. I defined a function in .bash_aliase like this.

function SearchPatInBlk {
awk "/$1/{inblk=1} inblk==1&&/$2/{inblk=0} inblk==1&&/$3/{print \$0}" $4
}

So the first argument and second argument is the block start and end pattern, third argument is the pattern I want to print the line with and the fourth argument is the xml filename. And then I gave this command at the bash shell.

SearchPatInBlk <spirit:addressBlock> </spirit:addressBlock> <spirit:name> ../../ab21/ab21_cmn700_new10_clst/build/ab21_cmn700/logical/cmn700/ipxact/cmn700_ab21.xml

Of course this gives me an error.

bash: syntax error near unexpected token `<'

So I tried putting some escape characters (\) before <,>,/ but it doesn't work. How should I do it?

Upvotes: 0

Views: 98

Answers (3)

pmf
pmf

Reputation: 36033

Don't use text processors like sed or awk on structured data. Use a (command-line) XML processor instead. Here are some options:

(Note that the sample given by itself isn't valid XML as it doesn't declare the spirit namespace, which makes the elements spirit:addressBlock and spirit:name lack their binding, and eventually trips up some of these processors, so you might want to add something along the lines of <abc … xmlns:spirit="…" … > to the sample. But if your actual document does declare them properly, you'll be fine using any of these.

Using xmlstarlet:

  • requires namespaces to be properly declared, so won't run with the sample given
  • the select (or sel) command makes xmlstarlet perform a query on the input, and the --template (or -t) flag starts a new template with the --var option importing a quoted (hence the @Q) value into a variable, and the -value-of (or -v) option evaluating an XPath expression used for extraction and printing
SearchPatInBlk() {
  xmlstarlet sel -t --var fst="${1@Q}" --var snd="${2@Q}" \
    -v '//*[name() = $fst]//*[name() = $snd]/text()' "$3"
}

SearchPatInBlk spirit:addressBlock spirit:name input.xml
  • you could even omit the namespaces in your function call by testing against the nodes' local-name() instead of their name() (but properly declared namespaces in the document are still required)
SearchPatInBlk() {
  xmlstarlet sel -t --var fst="${1@Q}" --var snd="${2@Q}" \
    -v '//*[local-name() = $fst]//*[local-name() = $snd]/text()' "$3"
}

SearchPatInBlk addressBlock name input.xml

Using libxml/xmllint:

  • also requires namespaces to be properly declared, so won't run with the sample given
  • with the --xpath option, this also uses an XPath expression to query and extract, but has no means to import external values, so the function's arguments are injected directly into the expression (note the double quotes around it)
SearchPatInBlk() {
  xmllint --xpath "//*[name() = ${1@Q}]//*[name() = ${2@Q}]/text()" "$3"
}

SearchPatInBlk spirit:addressBlock spirit:name input.xml
  • the change from name() to local-name() in order to allow for omitting the namespaces in the function call can be applied here as well (with the namespaces still required to be properly declared in the document)
SearchPatInBlk() {
  xmllint --xpath \
    "//*[local-name() = ${1@Q}]//*[local-name() = ${2@Q}]/text()" "$3"
}

SearchPatInBlk addressBlock name input.xml

Using kislyuk/yq:

  • doesn't care about namespace declarations, so indeed also runs with the sample given (but namespaces, if present, still need to be spelled out when referencing nodes by name, e.g. spirit:name)
  • the --arg option imports values into variables, and the --raw-output (or -r) flag decodes the result value into raw text (as otherwise it would be JSON-encoded, because under the hood it uses the JSON processor jq)
SearchPatInBlk() {
  xq --arg fst "$1" --arg snd "$2" -r \
    '..[$fst]? | ..[$snd]? | arrays[] // values' "$3"
}

SearchPatInBlk spirit:addressBlock spirit:name input.xml

Using mikefarah/yq:

  • completely ignores the usage of namespaces in names, so it also runs with the sample given (they are actually eliminated there altogether, so spelling them out in fact causes an error)
  • importing values is done via environment variables, and interpreting the input as XML is determined by the input file's extension (otherwise explicitly provide the --input-format xml (or -px) option), while the output is deliberately encoded as YAML using the --output-format yaml (or -oy) option to unquote the results
SearchPatInBlk() {
  fst="$1" snd="$2" yq -oy \
    '.. | .[strenv(fst)]? | .. | [] + .[strenv(snd)]? | .[]' "$3"
}

SearchPatInBlk addressBlock name input.xml

Upvotes: 2

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2809

echo '<abc>
      <spirit:addressBlock>
        <spirit:name>cmn700_registers</spirit:name>
          <def>
          </def>
      </spirit:addressBlock>
</abc>' | 

 mawk 'gsub(/^[^<]*|[^>]+$/,_, $!(NF *= (NF % 2) < (2 < NF)))' \
            ORS= FS='[<][/]' OFS='</' RS='spirit:addressBlock[>]'            

     1  <spirit:name>cmn700_registers</spirit:name>
     2            <def>
     3            </def>

Upvotes: 0

Renaud Pacalet
Renaud Pacalet

Reputation: 28910

Using a true XML parser would be better than a general purpose text processor like awk. But if you absolutely need awk there are several things to fix.

  • Quote your pattern strings.
  • Escape regex operators in your pattern strings.
  • Pass your pattern strings to awk as awk variables, not as parts of the awk script.
  • Use the regex,regex awk range pattern.

Optionally you could also use more accurate regex and, if your awk is GNU awk, mark the patterns as regex constants (@/.../):

function SearchPatInBlk {
  awk -v v1="$1" -v v2="$2" -v v3="$3" 'v1,v2 {if($0 ~ v3) print}' "$4"
}

SearchPatInBlk '@/^[[:space:]]*[<]spirit:addressBlock[>][[:space:]]*$/' \
  '@/^[[:space:]]*[<][/]spirit:addressBlock[>][[:space:]]*$/' \
  '@/[<]spirit:name[>]' file

Upvotes: 0

Related Questions