louis xie
louis xie

Reputation: 1422

grep return matching line and n lines before matching record

I have a restricted bash (has grep and sed amongst other tools, but not awk) which I'm trying to use to quickly automate some routine work. I'm currently using "grep keyword filename -b3" and would like to figure out how to do this more efficiently within the very limited tools I have.

How do I use bash to grep for the symbol "111AA2026", get the "record" name 3 lines above the matching line including the matched line itself for an XML file like this:

<record name="111111H2" />
<items>
  <field name="Electronic Identifier" value="1"/>
  <field name="Symbol" value="111AA2026"/>
  <field name="Full Symbol" value="111AA202622MARFUT"/>
  <field name="System Identifier" value="1"/>
  <field name="System Identifier Description" value="Description"/>
</items>
<record name="111111N1" />
<items>
  <field name="Electronic Identifier" value="2"/>
  <field name="Symbol" value="111AA2026"/>
  <field name="Full Symbol" value="111AA202621JULFUT"/>
  <field name="System Identifier" value="2"/>
  <field name="System Identifier Description" value="Description"/>
</items>
<record name="111111Q1" />
<items>
  <field name="Electronic Identifier" value="3"/>
  <field name="Symbol" value="111AA2026"/>
  <field name="Full Symbol" value="111AA202621AUGFUT"/>
  <field name="System Identifier" value="3"/>
  <field name="System Identifier Description" value="Description"/>
</items>
<record name="111111U1" />
<items>
  <field name="Electronic Identifier" value="4"/>
  <field name="Symbol" value="111AA2026"/>
  <field name="Full Symbol" value="111AA202621SEPFUT"/>
  <field name="System Identifier" value="4"/>
  <field name="System Identifier Description" value="Description"/>
</items>
<record name="111111Z1" />
<items>
  <field name="Electronic Identifier" value="5"/>
  <field name="Symbol" value="111AA2026"/>
  <field name="Full Symbol" value="111AA202621DECFUT"/>
  <field name="System Identifier" value="5"/>
  <field name="System Identifier Description" value="Description"/>
</items>

Note that there are multiple different "Symbol" values in the actual file

Sample output

<record name="111111H2" />
 <field name="Symbol" value="111AA2026"/>
--
<record name="111111N1" />
 <field name="Symbol" value="111AA2026"/>
--
<record name="111111Q1" />
 <field name="Symbol" value="111AA2026"/>
--
<record name="111111U1" />
 <field name="Symbol" value="111AA2026"/>
--
<record name="111111Z1" />
 <field name="Symbol" value="111AA2026"/>

The key challenge I have is grepping a matching result that gives me the matching line and 3 lines above, and not so much about how to get the attributes of an XML file

Upvotes: 1

Views: 116

Answers (2)

potong
potong

Reputation: 58430

This might work for you (GNU sed):

sed -nE '/record/{:a;N;/Symbol/!ba;/111AA2026/s/(\n).*(\1.*)/\2\1--/p}' file

Gather up lines between record and Symbol and if those lines contain the literal 111AA2026, print the first and last lines of the collection plus a delimiter --.

Alternative using grep only:

grep -B3 '111AA2026' file | grep 'record\|"Symbol"\|--'

Upvotes: 1

Ionuț G. Stan
Ionuț G. Stan

Reputation: 179129

Not sure if this is what you're looking for, but it outputs something very similar to what you gave in the sample output.

cat temp.xml \
  | grep -B3 '"111AA2026"' \
  | sed -n '/<record/p;/"Symbol/p'
# The -n flag disables printing of all lines, which is what sed
# does by default, so we need to handle printing ourselves using
# the "p" command.
sed -n '
  # [p]rint all lines that contain: <record
  /<record/ p
  # [p]rint all lines that contain: "Symbol
  /"Symbol/ p
'

Upvotes: 2

Related Questions