Reputation: 23
I'm trying to join two datastructures. I feel like this should be a simple enough task in bash, but I haven't managed to succeed so far.
I have two data files: file_1 is a list of identifiers; and file_2 is a list of tab-separated entries, where each entry consists of three numeric strings separated by commas (example file below). I'd like to append (or prepend) the string in each line in file_1 to the beginning of each numeric string in each line in file_2, e.g.
file_1 looks like this:
id_1
id_2
id_3
file_2 looks like this:
1234,543,134 210,1676,8 26,20,6
789,33400,342 8291,3390,890
772,602,3 224,220,1 407,405,2 8,895,7 985,93,4 96,93,3 145,145,3
I would like to have:
id_1,1234,543,134 id_1,210,1676,8 id_1,26,20,6
id_2,789,33400,342 id_2,8291,3390,890
id_3,772,602,3 id_3,224,220,1 id_3,407,405,2 id_3,8,895,7 id_3,985,93,4 id_3,96,93,3 id_3,145,145,3
file_1 and file_2 always have the same number of lines. In file_2, each comma-separated numeric string is always [digits],[digits],[digits] but there can be a variable number of strings on each line, and a variable number of digits within each string.
So far, I've managed to prepend each entry with a constant value, by adding a tab to the start of each line in file_2, then using gsub to replace each tab with the constant I want, e.g. ( printf '\t'; cat file_2.txt ) | awk '{ gsub("\t",",\tconstant,"); print }'
, which results in
, constant,1234,543,134, constant,210,1676,8, constant,26,20,6
789,33400,342, constant,8291,3390,890
772,602,3, constant,224,220,1, constant,407,405,2, constant,8,895,7, constant,985,93,4, constant,96,93,3, constant,145,145,3
and from there I can clean up the unwanted comma and tab at the start.
I wanted to build on this by using a while read
loop over file_2 and using each line number as a variable, e.g.
while read; do
line=$(awk '{ print NR}')
id_to_add=$(awk -v line=$line 'NR == line' file_1)
( printf '\t'; cat file_2.txt ) | awk -v id=${id_to_add} '{ gsub("\t",",\tid,"); print }'
done < file_2
However, this doesn't work because the variable $line is simply all the lines in file_2, rather than going through line by line, i.e. echo $line
returns 1 2 3
I feel like there should be a cleaner way to do this, perhaps using awk's two-file processing, awk 'NR==FNR' file_1 file_2
?
Thanks!
Upvotes: 2
Views: 79
Reputation: 203209
$ awk 'NR==FNR{a[NR]=$0; next} {for (i=1; i<=NF; i++) $i = a[FNR] "," $i} 1' file1 file2
id_1,1234,543,134 id_1,210,1676,8 id_1,26,20,6
id_2,789,33400,342 id_2,8291,3390,890
id_3,772,602,3 id_3,224,220,1 id_3,407,405,2 id_3,8,895,7 id_3,985,93,4 id_3,96,93,3 id_3,145,145,3
Upvotes: 1
Reputation: 50750
One way of doing it:
awk 'NR==FNR{a[NR]=($0 ",");next} {OFS=("\t" a[FNR]);$1=(a[FNR] $1)} 1' file1 file2
It simply updates the records in second file by
Upvotes: 1