Reputation: 65
I have a large number of files (around 500). Each file contain two columns. The first column is same for every file. I want to join all the files into a single file using gawk.
For example,
File 1
a 123
b 221
c 904
File 2
a 298
b 230
c 102
and so on. I want a final file like as below:
Final file
a 123 298
b 221 230
c 904 102
I have found scripts that can join two files, but I need to join multiple files.
Upvotes: 4
Views: 315
Reputation: 47
I have encountered this problem very frequently.
I strongly encourage you to check into the getline
function in gawk.
getline var < filename
is the command syntax and can be used to solve your problem.
I would suggest utilizing another language that solves this problem much more easily. Typically I invest about 5 lines of code to solve this standard problem.
j=1;
j=getline x < "filename";
if(j==0) {
break;
}
... (Commands involving x such as split and print).
Upvotes: 1
Reputation: 7834
awk 'FNR==NR{arr[$1]=$2; next;}{printf "%s%s%s%s%s",$1,OFS,arr[$1],OFS,$2; print"";}' file1 file2
based on this
Upvotes: 0
Reputation: 77105
For given sample files:
$ head f*
==> f1 <==
a 123
b 221
c 904
==> f2 <==
a 298
b 230
c 102
==> f3 <==
a 500
b 600
c 700
$ awk '{a[FNR]=((a[FNR])?a[FNR]FS$2:$0)}END{for(i=1;i<=FNR;i++) print a[i]}' f*
a 123 298 500
b 221 230 600
c 904 102 700
Using paste
and awk
together. (Assuming first column is same and present in all files). Doing paste f*
will give you the following result:
$ paste f*
a 123 a 298 a 500
b 221 b 230 b 600
c 904 c 102 c 700
Pipe that to awk
to remove extra columns.
$ paste f* | awk '{printf "%s ",$1;for(i=2;i<=NF;i+=2) printf "%s%s",$i,(i==NF?RS:FS)}'
a 123 298 500
b 221 230 600
c 904 102 700
You can re-direct the output to another file.
Upvotes: 5
Reputation: 3838
You could try something like :
$ ls
f1.txt f2.txt f3.txt
$ awk '($0 !~ /#/){a[$1]=a[$1]" "$2} END {for(i in a){print i""a[i]}}' *.txt
a 123 298 299
b 221 230 231
c 904 102 103
Upvotes: 0