Reputation: 33
I have some files named file1, file2, file3......etc. These files are in a folder f1. The content of the files are shown below. I would like to count the unique pairs of first column in each file. some files have no data. It is printed as zero. How can I do this with awk? Your suggestions would be appreciated.
file1
1586-1081 1586 1081 B-A NZ-OD1 3.01273
1586-1081 1586 1081 B-A NZ-OD2 2.69347
1589-1100 1589 1100 B-A NH1-OE1 3.80491
1589-1085 1589 1085 B-A NH2-OE2 2.7109
file2
43-415 43 415 B-A OE1-NH1 2.84503
43-415 43 415 B-A OE1-NH2 2.99614
Desired output
file1 3
file2 1
Upvotes: 2
Views: 62
Reputation: 77155
With GNU awk
you can use BEGINFILE
and ENDFILE
blocks.
$ cat file1
1586-1081 1586 1081 B-A NZ-OD1 3.01273
1586-1081 1586 1081 B-A NZ-OD2 2.69347
1589-1100 1589 1100 B-A NH1-OE1 3.80491
1589-1085 1589 1085 B-A NH2-OE2 2.7109
$ cat file2
43-415 43 415 B-A OE1-NH1 2.84503
43-415 43 415 B-A OE1-NH2 2.99614
$ awk 'BEGINFILE{delete a}{!a[$1]++}ENDFILE{print FILENAME, length(a)}' file1 file2
file1 3
file2 1
You don't have to specify every file. If you want to do on all files under current directory, just use glob (i.e *
) to reference all files.
Upvotes: 6