samos_ravenshmir
samos_ravenshmir

Reputation: 11

Match up to on one column of file with another file and replace different column using bash

I'm a bit of a noob with this so forgive my ignorance if I am not clear enough. I have two files. File1 looks like below.

file1

AA--:Aa-to-Bb:VGG sometext 1223344 10000 sometext
AB--:Aa-to-Bb:VGG sometext 1223344 12000 sometext
CC--:Dd-to-Ee:VGG sometext 1223344 44000 sometext
AA--:Zz-to-Yy:VGG sometext 1223344 51000 sometext
DD--:Aa-to-Dd:VGG sometext 1223344 36000 sometext

File2 looks similar but there are differences in columns 1 and 4.

file2

CC--:Dd-to-Ee:VGG6645 sometext 1223344 3000 sometext
AA--:Zz-to-Yy:VGG2244 sometext 1223344 1000 sometext
AA--:Aa-to-Bb:VGG12345 sometext 1223344 12200 sometext
AB--:Aa-to-Bb:VGG76523 sometext 1223344 33300 sometext
EE--:Dd-to-Ee:VGG45123 sometext 1223344 76500 sometext
DD--:Aa-to-Dd:VGG3486 sometext 1223344 400 sometext
AA--:Dd-to-Ee:VGG4512 sometext 1223344 22300 sometext
AA--:Zz-to-Dd:VGG98765 sometext 1223344 7000 sometext
CC--:Dd-to-Zz:VGG43576 sometext 1223344 900 sometext
FF--:Zz-to-Yy:VGG5645 sometext 1223344 91200 sometext
CC--:Zz-to-Ee:VGG23456 sometext 1223344 3400 sometext
AA--:Ff-to-Yy:VGG1111 sometext 1223344 51000 sometext

I am trying to do some auto-processing via a bash script. One of the tasks is to compare the two files. If an entry in file1 column 1 matches an entry in file2 column 1 up to and including the "VGG", then the value on column 4 in file1 replaces the value in column 4 in file2.

new file2

CC--:Dd-to-Ee:VGG6645 sometext 1223344 44000 sometext
AA--:Zz-to-Yy:VGG2244 sometext 1223344 51000 sometext
AA--:Aa-to-Bb:VGG12345 sometext 1223344 10000 sometext
AB--:Aa-to-Bb:VGG76523 sometext 1223344 12000 sometext
EE--:Dd-to-Ee:VGG45123 sometext 1223344 76500 sometext
DD--:Aa-to-Dd:VGG3486 sometext 1223344 36000 sometext
AA--:Dd-to-Ee:VGG4512 sometext 1223344 22300 sometext
AA--:Zz-to-Dd:VGG98765 sometext 1223344 7000 sometext
CC--:Dd-to-Zz:VGG43576 sometext 1223344 900 sometext
FF--:Zz-to-Yy:VGG5645 sometext 1223344 91200 sometext
CC--:Zz-to-Ee:VGG23456 sometext 1223344 3400 sometext
AA--:Ff-to-Yy:VGG1111 sometext 1223344 51000 sometext

I have tried using various awk and sed commands, as well as grep, sort, and others and I cannot get the results I am looking for. Any help or suggestions you have would be greatly appreciated.

Upvotes: 0

Views: 462

Answers (2)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed -r 's|^(\S+)(\s(\S+)){3}\s.*|/^\1/s/\\S+/\3/4|' file2 | sed -rf - -i file1

This makes a sed script from file2 and then runs it against file1.

The sed script generated from file2, matches a line in the object file using the first column value and then replaces the fourth column in the object file with the value in the its fourth column.

Upvotes: 0

sjsam
sjsam

Reputation: 21955

Below awk may help

$ awk 'NR==FNR{rec[$1]=$4;next}{temp=$1;sub(/VGG[0-9]+$/,"VGG",temp)} temp in rec{$4=rec[temp]}1' file1 file2

Output

CC--:Dd-to-Ee:VGG6645 sometext 1223344 44000 sometext
AA--:Zz-to-Yy:VGG2244 sometext 1223344 51000 sometext
AA--:Aa-to-Bb:VGG12345 sometext 1223344 10000 sometext
AB--:Aa-to-Bb:VGG76523 sometext 1223344 12000 sometext
EE--:Dd-to-Ee:VGG45123 sometext 1223344 76500 sometext
DD--:Aa-to-Dd:VGG3486 sometext 1223344 36000 sometext
AA--:Dd-to-Ee:VGG4512 sometext 1223344 22300 sometext
AA--:Zz-to-Dd:VGG98765 sometext 1223344 7000 sometext
CC--:Dd-to-Zz:VGG43576 sometext 1223344 900 sometext
FF--:Zz-to-Yy:VGG5645 sometext 1223344 91200 sometext
CC--:Zz-to-Ee:VGG23456 sometext 1223344 3400 sometext
AA--:Ff-to-Yy:VGG1111 sometext 1223344 51000 sometext

If you need to rewrite file2

$ awk 'NR==FNR{rec[$1]=$4;next}
      {temp=$1;sub(/VGG[0-9]+$/,"VGG",temp)}
      temp in rec{$4=rec[temp]}1' file1 file2 >filetmp && mv filetmp file2

Upvotes: 1

Related Questions