Reputation: 407
I have an xml file and it has '<' in between the lines without escape characters in front.. So first thing i tried to parse the xml using:
xmllint --noout filename.xml
but that doesnt work.. because my xml version is 1.1 which is not supported.. So as an alternate I started searching for '<' excluding the beginning or the end of the sentence..
should be fairly easy.. i tried:
grep -v '^[<]'
but that is not working.. can someone help?
ex: filename has:
<instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
<field fieldname="CUR007" value="<EUR>"/>
<field fieldname="C207" value="2023-01-11"/>
<field fieldname="INS160" value="0"/>
<field fieldname="PRD013" value="1020"/>
<field fieldname="PRD150" value="0"/>
<field fieldname="PRD205" value="0"/>
</instrument>
I want output to be
<instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
<field fieldname="CUR007" value="<EUR>"/>
Upvotes: 1
Views: 342
Reputation: 23667
I've created a different sample to add some more cases
$ cat ip.txt
foo bar < xyz
<123 abc <42> >
<good>
bad > line
$ # get lines having < not at start of line
$ grep '[^[:blank:]].*<' ip.txt
foo bar < xyz
<123 abc <42> >
$ # get lines having > not at end of line
$ grep '>.*[^[:blank:]]' ip.txt
<123 abc <42> >
bad > line
$ # combining the two
$ grep -E '[^[:blank:]].*<|>.*[^[:blank:]]' ip.txt
foo bar < xyz
<123 abc <42> >
bad > line
[:blank:]
represents space and tab characters[^[:blank:]]
will match a non-blank characterUpvotes: 1
Reputation: 424993
Search for a <
or >
other than the first/last non-whitespace char which should be angle brackets.
grep '^\s*<.*[<>].*>\s*'
Note that this matches the whole line, so it may be used if you are wanting to do something with the line (rather than just part of it).
A test:
grep '^\s*<.*[<>].*>\s*' << EOF
> <instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
> <field fieldname="CUR007" value="<EUR>"/>
> <field fieldname="C207" value="2023-01-11"/>
> <field fieldname="INS160" value="0"/>
> <field fieldname="PRD013" value="1020"/>
> <field fieldname="PRD150" value="0"/>
> <field fieldname="PRD205" value="0"/>
> </instrument>
> EOF
Output:
<instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
<field fieldname="CUR007" value="<EUR>"/>
Upvotes: 1