Mike Pérez
Mike Pérez

Reputation: 201

How to filter columns in awk?

I was wondering how to filter the following lines in AWK:

DSL - 

  1. Digital Simulation Language.  Extensions to FORTRAN to simulate analog
computer functions.  "DSL/90 - A Digital Simulation Program for Continuous
System Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 for
the IBM 7090.  Sammet 1969, p.632.

FLIP - 

  1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).

  2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.

  3. Formal LIst Processor.  Early language for pattern-matching on LISP
structures.  Similar to CONVERT.  "FLIP, A Format List Processor", W.
Teitelman, Memo MAC-M-263, MIT 1966.

So I can get something like this:

DSL

FLIP

I am using the following sentences in AWK:

BEGIN { RS = "\n\n\n" ;  FS = " - " } 

{ print $1 }

But what I get is just this:

DSL

Thanks in advance!

Upvotes: 0

Views: 805

Answers (5)

technosaurus
technosaurus

Reputation: 7802

Assuming the format is constant (no spaces in first entry):

if ($2=="-"){print $1}

Edit: but if you had an entry like:

Objective C -
...

You would need something like:

if ($NF=="-"){$NF="";print}

awk is really good at parsing flat files that are in a predictable format.

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203209

@JonathanLeffler gave you a good awk answer to your specific question but if you're going to be working on files with that format a lot, you may want to consider reformatting them to have records separated by newlines with each list item on a single line, e.g.:

$ cat file
DSL -

  1. Digital Simulation Language.  Extensions to FORTRAN to simulate analog
computer functions.  "DSL/90 - A Digital Simulation Program for Continuous
System Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 for
the IBM 7090.  Sammet 1969, p.632.

FLIP -

  1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).

  2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.

  3. Formal LIst Processor.  Early language for pattern-matching on LISP
structures.  Similar to CONVERT.  "FLIP, A Format List Processor", W.
Teitelman, Memo MAC-M-263, MIT 1966.

$ awk '!/^[[:space:]]*$/{printf "%s%s", (NF==2 && /-[[:space:]]*$/ ? rs rs : (/^ +[[:digit:]]+\./ ? rs : "")), $0; rs="\n"} END{print ""}' file
DSL -
  1. Digital Simulation Language.  Extensions to FORTRAN to simulate analogcomputer functions.  "DSL/90 - A Digital Simulation Program for ContinuousSystem Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 forthe IBM 7090.  Sammet 1969, p.632.

FLIP -
  1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).
  2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.
  3. Formal LIst Processor.  Early language for pattern-matching on LISPstructures.  Similar to CONVERT.  "FLIP, A Format List Processor", W.Teitelman, Memo MAC-M-263, MIT 1966.

That way you can process the output easily to print or do whatever else you want, e.g.

1) to print every header line plus first bullet item:

$ awk '...' file | awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} {print $1,$2}'
DSL -
  1. Digital Simulation Language.  Extensions to FORTRAN to simulate analogcomputer functions.  "DSL/90 - A Digital Simulation Program for ContinuousSystem Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 forthe IBM 7090.  Sammet 1969, p.632.

FLIP -
  1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).

2) to print the header line plus the second bullet item of just the "FLIP" record:

$ awk '...' file | awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} /^FLIP -/{print $1,$3}'
FLIP -
  2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.

3) to print the header line plus a count of the bullet items for that header:

$ awk '...' file | awk 'BEGIN{RS=""; FS=OFS="\n"} {print $1 NF-1}'
DSL - 1
FLIP - 3

etc., etc.

Upvotes: 1

Mr_SD
Mr_SD

Reputation: 1

If all the lines you want to skip start with a space this will work:

awk -F"-" '{if (substr($1,1,1)!=" ")print $1}'

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 753525

It appears that you're looking for a line with two words only on it, and the second word is -. If so, then you could write:

awk 'NF == 2 && $2 == "-" { print $1 }'

You could further qualify it to insist that $1 starts at the beginning of the line (no leading blanks):

awk '$0 !~ /^ / && NF == 2 && $2 == "-" { print $1 }'

Both these produce lines containing just DSL and FLIP on the given data.

Upvotes: 1

Kent
Kent

Reputation: 195029

a short grep line can do it for you:

grep -Po '.*(?= -\s*$)' file

Upvotes: 0

Related Questions