Reputation: 201
I was wondering how to filter just the name of the programming language from a txt file. I have used the following sentence in AWK but I cannot get what I want:
($1 ~ /[A-Za-z]*/) && ( ($3 ~ /-/) || ($4 ~ /-/) )
Any ideas of how to do it? Cause as you can see, there is no regular way the lines are written.
In other words, I have the following lines but I just want to print just the programming language name
2.PAK - AI language with coroutines. "The 2.PAK Language: Goals and
Description", L.F. Melli, Proc IJCAI 1975.
473L Query - English-like query language for Air Force 473L system. Sammet
1969, p.665. "Headquarters USAF Command and Control System Query
Language", Info Sys Sci, Proc 2nd Congress, Spartan Books 1965, pp.57-76.
3-LISP - Brian Smith. A procedurally reflective dialect of LISP which uses
an infinite tower of interpreters.
I just want to filter and make the following lines appear:
2.PAK
473L Query
3-LISP
Edit: Now would the same sentence work for the following?
DML -
1. Data Management Language. Early ALGOL-like language with lists,
graphics, on Honeywell 635.
2. "DML: A Meta-language and System for the Generation of Practical and
Efficient Compilers from Denotational Specifications"
I guess I just have to fix some of the RS and FS stuff so I can get this line?
DML
Thanks in advance!
Upvotes: 0
Views: 190
Reputation: 923
It looks like " - " could be a good separator, given the file:
$ cat /tmp/a
2.PAK - AI language with coroutines. "The 2.PAK Language: Goals and
Description", L.F. Melli, Proc IJCAI 1975.
473L Query - English-like query language for Air Force 473L system. Sammet
1969, p.665. "Headquarters USAF Command and Control System Query
Language", Info Sys Sci, Proc 2nd Congress, Spartan Books 1965, pp.57-76.
3-LISP - Brian Smith. A procedurally reflective dialect of LISP which uses
an infinite tower of interpreters.
you could use the following:
$ awk -F ' - ' '/ - /{ print $1 }' /tmp/a
2.PAK
473L Query
3-LISP
$
Upvotes: 1
Reputation: 183241
If I understand correctly that your file consists of multiline "stanzas" that are separated by blank lines, and each "stanza" begins with a language-name followed by -
, then you can write:
awk 'BEGIN { RS = "\n\n"; FS = " - " } { print $1 }'
The BEGIN
block (which is run before the first record is read) sets the record separator RS
to "\n\n"
(two newlines, i.e., a blank line), so each of your stanzas is a single AWK record, and the field separator FS
to -
, so the language name is the first "field" of the stanza. The block { print $1 }
prints the first field in each record.
Upvotes: 0