B Verulam
B Verulam

Reputation: 23

How to take elements from each line of a file, and prepend them to each entry in each line of a different file?

I'm trying to join two datastructures. I feel like this should be a simple enough task in bash, but I haven't managed to succeed so far.

I have two data files: file_1 is a list of identifiers; and file_2 is a list of tab-separated entries, where each entry consists of three numeric strings separated by commas (example file below). I'd like to append (or prepend) the string in each line in file_1 to the beginning of each numeric string in each line in file_2, e.g.

file_1 looks like this:

id_1
id_2
id_3

file_2 looks like this:

1234,543,134    210,1676,8  26,20,6
789,33400,342   8291,3390,890
772,602,3   224,220,1   407,405,2   8,895,7 985,93,4    96,93,3 145,145,3

I would like to have:

id_1,1234,543,134   id_1,210,1676,8 id_1,26,20,6
id_2,789,33400,342  id_2,8291,3390,890
id_3,772,602,3  id_3,224,220,1  id_3,407,405,2  id_3,8,895,7    id_3,985,93,4   id_3,96,93,3    id_3,145,145,3

file_1 and file_2 always have the same number of lines. In file_2, each comma-separated numeric string is always [digits],[digits],[digits] but there can be a variable number of strings on each line, and a variable number of digits within each string.

What I've done so far

So far, I've managed to prepend each entry with a constant value, by adding a tab to the start of each line in file_2, then using gsub to replace each tab with the constant I want, e.g. ( printf '\t'; cat file_2.txt ) | awk '{ gsub("\t",",\tconstant,"); print }', which results in

,   constant,1234,543,134,  constant,210,1676,8,    constant,26,20,6
789,33400,342,  constant,8291,3390,890
772,602,3,  constant,224,220,1, constant,407,405,2, constant,8,895,7,   constant,985,93,4,  constant,96,93,3,   constant,145,145,3

and from there I can clean up the unwanted comma and tab at the start.

I wanted to build on this by using a while read loop over file_2 and using each line number as a variable, e.g.

while read; do 
line=$(awk '{ print NR}')
id_to_add=$(awk -v line=$line 'NR == line' file_1)
( printf '\t'; cat file_2.txt ) | awk -v id=${id_to_add} '{ gsub("\t",",\tid,"); print }'
done < file_2

However, this doesn't work because the variable $line is simply all the lines in file_2, rather than going through line by line, i.e. echo $line returns 1 2 3

I feel like there should be a cleaner way to do this, perhaps using awk's two-file processing, awk 'NR==FNR' file_1 file_2 ?

Thanks!

Upvotes: 2

Views: 79

Answers (2)

Ed Morton
Ed Morton

Reputation: 203209

$ awk 'NR==FNR{a[NR]=$0; next} {for (i=1; i<=NF; i++) $i = a[FNR] "," $i} 1' file1 file2
id_1,1234,543,134 id_1,210,1676,8 id_1,26,20,6
id_2,789,33400,342 id_2,8291,3390,890
id_3,772,602,3 id_3,224,220,1 id_3,407,405,2 id_3,8,895,7 id_3,985,93,4 id_3,96,93,3 id_3,145,145,3

Upvotes: 1

oguz ismail
oguz ismail

Reputation: 50750

One way of doing it:

awk 'NR==FNR{a[NR]=($0 ",");next} {OFS=("\t" a[FNR]);$1=(a[FNR] $1)} 1' file1 file2

It simply updates the records in second file by

  1. prepending the first field with corresponding id from the first file,
  2. appending said id to field separator.

Upvotes: 1

Related Questions