Mike Pérez
Mike Pérez

Reputation: 201

Print just some columns in awk

I was wondering how to filter just the name of the programming language from a txt file. I have used the following sentence in AWK but I cannot get what I want:

($1 ~ /[A-Za-z]*/)  && ( ($3 ~ /-/) || ($4 ~ /-/) )

Any ideas of how to do it? Cause as you can see, there is no regular way the lines are written.

In other words, I have the following lines but I just want to print just the programming language name

2.PAK - AI language with coroutines.  "The 2.PAK Language: Goals and
Description", L.F. Melli, Proc IJCAI 1975.

473L Query - English-like query language for Air Force 473L system.  Sammet
1969, p.665.  "Headquarters USAF Command and Control System Query
Language", Info Sys Sci, Proc 2nd Congress, Spartan Books 1965, pp.57-76.

3-LISP - Brian Smith.  A procedurally reflective dialect of LISP which uses
an infinite tower of interpreters. 

I just want to filter and make the following lines appear:

2.PAK

473L Query 

3-LISP

Edit: Now would the same sentence work for the following?

DML - 

  1. Data Management Language.  Early ALGOL-like language with lists,
graphics, on Honeywell 635.  

  2. "DML: A Meta-language and System for the Generation of Practical and
Efficient Compilers from Denotational Specifications"

I guess I just have to fix some of the RS and FS stuff so I can get this line?

DML

Thanks in advance!

Upvotes: 0

Views: 190

Answers (2)

cyberz
cyberz

Reputation: 923

It looks like " - " could be a good separator, given the file:

$ cat /tmp/a 
2.PAK - AI language with coroutines.  "The 2.PAK Language: Goals and
Description", L.F. Melli, Proc IJCAI 1975.

473L Query - English-like query language for Air Force 473L system.  Sammet
1969, p.665.  "Headquarters USAF Command and Control System Query
Language", Info Sys Sci, Proc 2nd Congress, Spartan Books 1965, pp.57-76.

3-LISP - Brian Smith.  A procedurally reflective dialect of LISP which uses
an infinite tower of interpreters. 

you could use the following:

$ awk -F ' - ' '/ - /{ print $1 }' /tmp/a
2.PAK
473L Query
3-LISP
$ 

Upvotes: 1

ruakh
ruakh

Reputation: 183241

If I understand correctly that your file consists of multiline "stanzas" that are separated by blank lines, and each "stanza" begins with a language-name followed by  - , then you can write:

awk 'BEGIN { RS = "\n\n"; FS = " - " } { print $1 }'

The BEGIN block (which is run before the first record is read) sets the record separator RS to "\n\n" (two newlines, i.e., a blank line), so each of your stanzas is a single AWK record, and the field separator FS to  - , so the language name is the first "field" of the stanza. The block { print $1 } prints the first field in each record.

Upvotes: 0

Related Questions