Ashish K
Ashish K

Reputation: 935

Awk to read file as a whole

Let a file with content as under -

abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq

In general if any operation using awk is performed, it iterates line by line and performs that action on each line.

For e.g:

awk '{print substr($0,8,10)}' file

O/P:

hijklmn
wxyzabc
klmnopq

I would like to know an approach in which all the contents inside the file is treated as a single variable and awk prints just one output.

Example Desired O/P:

hijklmnpqr

It's not that I wish for the desired output for the given question but in general would appreciate if anyone could suggest an approach to provide the content of a file as a whole to the awk.

Upvotes: 4

Views: 11118

Answers (2)

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14975

This is a gawk solution

From the docs:

There are times when you might want to treat an entire data file as a single record. The only way to make this happen is to give RS a value that you know doesn’t occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files.


$ cat file
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq

The RS must be set to a pattern not present in archive, following Denis Shirokov suggestion on the docs (Thanks @EdMorton):

$ gawk '{print ">>>"$0"<<<<"}' RS='^$' file
>>>abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq

abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
<<<<

The trick is in bold font:

It works by setting RS to ^$, a regular expression that will never match if the file has contents. gawk reads data from the file into tmp, attempting to match RS. The match fails after each read, but fails quickly, such that gawk fills tmp with the entire contents of the file


So:

$ gawk '{gsub(/\n/,"");print substr($0,8,10)}' RS='^$' file

Returns:

hijklmnpqr

Upvotes: 7

Ed Morton
Ed Morton

Reputation: 204731

With GNU awk for multi-char RS (best approach):

$ awk -v RS='^$' '{print substr($0,8,10)}' file
hijklmn
pq

With other awks if your input can't contain NUL characters:

$ awk -v RS='\0' '{print substr($0,8,10)}' file
hijklmn
pq

With other awks otherwise:

$ awk '{rec = rec $0 ORS} END{print substr(rec,8,10)}' file
hijklmn
pq

Note that none of those produce the output you say you wanted:

hijklmnpqr

because they do what you say you wanted (a newline is just another character in your input file, nothing special):

"read file as a whole"

To get the output you say you want would require removing all newlines from the file first. You can do that with gsub(/\n/,"") or various other methods such as:

$ awk '{rec = rec $0} END{print substr(rec,8,10)}' file
hijklmnpqr

if that's really what you want.

Upvotes: 3

Related Questions