Reputation: 935
Let a file with content as under -
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
In general if any operation using awk is performed, it iterates line by line and performs that action on each line.
For e.g:
awk '{print substr($0,8,10)}' file
O/P:
hijklmn
wxyzabc
klmnopq
I would like to know an approach in which all the contents inside the file is treated as a single variable and awk
prints just one output.
Example Desired O/P:
hijklmnpqr
It's not that I wish for the desired output for the given question but in general would appreciate if anyone could suggest an approach to provide the content of a file as a whole to the awk.
Upvotes: 4
Views: 11118
Reputation: 14975
gawk
solutionFrom the docs:
There are times when you might want to treat an entire data file as a single record. The only way to make this happen is to give RS a value that you know doesn’t occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files.
$ cat file
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
The RS
must be set to a pattern not present in archive, following Denis Shirokov suggestion on the docs (Thanks @EdMorton):
$ gawk '{print ">>>"$0"<<<<"}' RS='^$' file
>>>abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
<<<<
The trick is in bold font:
It works by setting RS to
^$
, a regular expression that will never match if the file has contents. gawk reads data from the file into tmp, attempting to match RS. The match fails after each read, but fails quickly, such that gawk fills tmp with the entire contents of the file
So:
$ gawk '{gsub(/\n/,"");print substr($0,8,10)}' RS='^$' file
Returns:
hijklmnpqr
Upvotes: 7
Reputation: 204731
With GNU awk for multi-char RS (best approach):
$ awk -v RS='^$' '{print substr($0,8,10)}' file
hijklmn
pq
With other awks if your input can't contain NUL characters:
$ awk -v RS='\0' '{print substr($0,8,10)}' file
hijklmn
pq
With other awks otherwise:
$ awk '{rec = rec $0 ORS} END{print substr(rec,8,10)}' file
hijklmn
pq
Note that none of those produce the output you say you wanted:
hijklmnpqr
because they do what you say you wanted (a newline is just another character in your input file, nothing special):
"read file as a whole"
To get the output you say you want would require removing all newlines from the file first. You can do that with gsub(/\n/,"")
or various other methods such as:
$ awk '{rec = rec $0} END{print substr(rec,8,10)}' file
hijklmnpqr
if that's really what you want.
Upvotes: 3