Richard Rublev
Richard Rublev

Reputation: 8162

How to grep my xml file and save output?

I am just giving part of huge xml file

   <caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">4.00000000e+04</c1>
      <c2 unit="V/(nT*Hz)">8.35950000e-06</c2>
      <c3 unit="deg">-1.17930000e+02</c3>
    </caldata>
    <caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">5.55810000e+04</c1>
      <c2 unit="V/(nT*Hz)">4.43400000e-06</c2>
      <c3 unit="deg">-1.58280000e+02</c3>
    </caldata>
    <caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">6.00000000e+04</c1>
      <c2 unit="V/(nT*Hz)">3.63180000e-06</c2>
      <c3 unit="deg">-1.67340000e+02</c3>
    </caldata>
    <caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">4.00000000e-01</c1>
      <c2 unit="V/(nT*Hz)">1.07140000e-02</c2>
      <c3 unit="deg">1.48080000e+02</c3>
    </caldata>
    <caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">5.55800000e-01</c1>
      <c2 unit="V/(nT*Hz)">1.33250000e-02</c2>
      <c3 unit="deg">1.39110000e+02</c3>
    </caldata>
    <caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">7.72300000e-01</c1>
      <c2 unit="V/(nT*Hz)">1.57750000e-02</c2>
      <c3 unit="deg">1.29560000e+02</c3>

I have tried like this

grep '<c1 unit="Hz"' *.xml | cut -f2 -d">"|cut -f1 -d"<"

Works fine bit what I really want is output only when caldata chopper="off" and to save my output to file. How to do this?

Upvotes: 2

Views: 479

Answers (3)

brandizzi
brandizzi

Reputation: 27070

A solution would be to use an XML grep, such as xgrep. I tried it myself on my machine and got this:

$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml 
4.00000000e-01
5.55800000e-01
7.72300000e-01

The secret is the XPath expression:

  • //caldata[@chopper="off"] - take all caldata element with chopper attribute equals to off;
  • c1[@unit="Hz"] - from that caldata elements, get c1 elements with unit attribute equals to Hz;
  • text() - from those c1 elements, get only the text content.

To save it to an output file, just use the > redirector from the shell. We just need to add it after the command, and then add the name of the file to get the output:

$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml  > output.xml
$ cat output.xml 
4.00000000e-01
5.55800000e-01
7.72300000e-01

I don't know if you could use a custom tool like this, sure, but if you can, it can be your best solution.

Upvotes: 3

Terry Carmen
Terry Carmen

Reputation: 3896

Since you're using grep, I'm going to assume some flavor of *nix and a command-line type solution

In that case, you probably want to look at something like zorba, which will parse your input document with an xquery and output the parts you want.

If the container element in your data was foo, the xquery would contain:

for $c in /foo/caldata
return if ($c/@chopper="on")
then $c else ""

Upvotes: 0

Bolo
Bolo

Reputation: 11690

This will do:

cat file.xml | awk '/chopper="off"/,/calcdata/{print}' | grep 'unit="Hz"' | sed 's/^.*">//;s/<.*$//'

The first command (awk) takes only the chunks that contain chopper="off". The second command (grep) takes only the lines with the numbers you want. The third command (sed) takes the number from the line.

Upvotes: 2

Related Questions