Reputation: 8162
I am just giving part of huge xml file
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">4.00000000e+04</c1>
<c2 unit="V/(nT*Hz)">8.35950000e-06</c2>
<c3 unit="deg">-1.17930000e+02</c3>
</caldata>
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">5.55810000e+04</c1>
<c2 unit="V/(nT*Hz)">4.43400000e-06</c2>
<c3 unit="deg">-1.58280000e+02</c3>
</caldata>
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">6.00000000e+04</c1>
<c2 unit="V/(nT*Hz)">3.63180000e-06</c2>
<c3 unit="deg">-1.67340000e+02</c3>
</caldata>
<caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">4.00000000e-01</c1>
<c2 unit="V/(nT*Hz)">1.07140000e-02</c2>
<c3 unit="deg">1.48080000e+02</c3>
</caldata>
<caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">5.55800000e-01</c1>
<c2 unit="V/(nT*Hz)">1.33250000e-02</c2>
<c3 unit="deg">1.39110000e+02</c3>
</caldata>
<caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">7.72300000e-01</c1>
<c2 unit="V/(nT*Hz)">1.57750000e-02</c2>
<c3 unit="deg">1.29560000e+02</c3>
I have tried like this
grep '<c1 unit="Hz"' *.xml | cut -f2 -d">"|cut -f1 -d"<"
Works fine bit what I really want is output only when
caldata chopper="off"
and to save my output to file.
How to do this?
Upvotes: 2
Views: 479
Reputation: 27070
A solution would be to use an XML grep, such as xgrep
. I tried it myself on my machine and got this:
$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml
4.00000000e-01
5.55800000e-01
7.72300000e-01
The secret is the XPath expression:
//caldata[@chopper="off"]
- take all caldata
element with chopper
attribute equals to off
;c1[@unit="Hz"]
- from that caldata
elements, get c1
elements with unit
attribute equals to Hz
;text()
- from those c1
elements, get only the text content.To save it to an output file, just use the >
redirector from the shell. We just need to add it after the command, and then add the name of the file to get the output:
$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml > output.xml
$ cat output.xml
4.00000000e-01
5.55800000e-01
7.72300000e-01
I don't know if you could use a custom tool like this, sure, but if you can, it can be your best solution.
Upvotes: 3
Reputation: 3896
Since you're using grep, I'm going to assume some flavor of *nix and a command-line type solution
In that case, you probably want to look at something like zorba, which will parse your input document with an xquery and output the parts you want.
If the container element in your data was foo, the xquery would contain:
for $c in /foo/caldata
return if ($c/@chopper="on")
then $c else ""
Upvotes: 0
Reputation: 11690
This will do:
cat file.xml | awk '/chopper="off"/,/calcdata/{print}' | grep 'unit="Hz"' | sed 's/^.*">//;s/<.*$//'
The first command (awk
) takes only the chunks that contain chopper="off"
. The second command (grep
) takes only the lines with the numbers you want. The third command (sed
) takes the number from the line.
Upvotes: 2