casey
casey

Reputation: 15

How to extract specific lines from a text file using awk?

I have a text file that looks like this.

A   102
B   456
C   678
H    A       B        C      D       E        F      G       H       I       J
    1.18    0.20    0.23    0.05    1.89    0.72    0.11    0.49    0.31    1.45
    3.23    0.06    2.67    1.96    0.76    0.97    0.84    0.77    0.39    1.08

I need to extract all the lines that start with B,H and two lines after H . How can I do this using awk?

The expected output would be

 B   456
 H    A       B        C      D       E        F      G       H       I       J
    1.18    0.20    0.23    0.05    1.89    0.72    0.11    0.49    0.31    1.45
    3.23    0.06    2.67    1.96    0.76    0.97    0.84    0.77    0.39    1.08

Any suggestions please.

Upvotes: 1

Views: 1686

Answers (5)

cppcoder
cppcoder

Reputation: 23145

bash-3.00$ cat t
A   102
B   456
C   678
H    A       B        C      D       E        F      G       H       I       J
    1.18    0.20    0.23    0.05    1.89    0.72    0.11    0.49    0.31    1.45
    3.23    0.06    2.67    1.96    0.76    0.97    0.84    0.77    0.39    1.08

bash-3.00$ awk '{if(( $1 == "B") || ($1 == "H") || ($0 ~ /^ / )) print;}' t
B   456
H    A       B        C      D       E        F      G       H       I       J
    1.18    0.20    0.23    0.05    1.89    0.72    0.11    0.49    0.31    1.45
    3.23    0.06    2.67    1.96    0.76    0.97    0.84    0.77    0.39    1.08

OR in short

awk '{if($0 ~ /^[BH ]/ ) print;}' t

OR even shorter

awk '/^[BH ]/' t

Upvotes: 1

William Pursell
William Pursell

Reputation: 212584

Ignoring the blank line after B in your output (your problem specifications give no indication as to why that blank line is in the output, so I'm assuming it should not be there):

awk '/^H/{t=3} /^B/ || t-- >0' input.file

will print all lines that start with B and each line that starts with H along with the next two lines.

Upvotes: 1

user1416258
user1416258

Reputation:

cat filename.txt | awk '/^[B(H(^ .*$){2})].*$/' > output.txt

EDIT: Updated for OP's edit

Upvotes: 0

Eduardo Ivanec
Eduardo Ivanec

Reputation: 11862

If H and B aren't the only headers that are sent before tabular data and you intend to omit those blocks of data (you don't specify the requirements fully) you have to use a flip-flop to remember if you're currently in a block you want to keep or not:

awk '/^[^ 0-9]/ {inblock=0}; /^[BH]/ {inblock=1}; { if (inblock) print }' d.txt

Upvotes: 0

Dennis Williamson
Dennis Williamson

Reputation: 360615

awk '/^[BH]/ || /^[[:blank:]]*[[:digit:]]/' inputfile

Upvotes: 1

Related Questions