j3pinter
j3pinter

Reputation: 33

Shell script - Smart replace in file with lookup in second file

I have two files, one data file and one lookup file.

One field of the data file must be altered by a value, which can be found in the lookup file.

The datafile looks like:

2013-04-24;1;0.1635;1.4135
2013-04-24;1;0.9135;1.4135
2013-04-24;2;0.9135;1.4135

The lookup file looks like:

1;2ab1e4c0-de4d-11e2-a934-0f0479162b1b
2;2ab21e90-de4d-11e2-9ce8-d368d9512bad
3;2ab2582e-de4d-11e2-bb5f-6b1f6c4437f8

The result must be:

2013-04-24 2ab1e4c0-de4d-11e2-a934-0f0479162b1b 0.1635 1.4135
2013-04-24 2ab1e4c0-de4d-11e2-a934-0f0479162b1b 0.9135 1.4135
2013-04-24 2ab21e90-de4d-11e2-9ce8-d368d9512bad 0.9135 1.4135

I know how to use awk to read the data file and transform the field seperator.

    awk 'BEGIN { FS = ";"; OFS = " " } ;
        {  print $1, $2, #3, $4 }' $1 > $1.updated

But I don't know a smart way to lookup variable $2 in the lookup file in shell script and replace the original value with the UUID.

The lookup file will never be large, in extreme situations there will be a maximum of 1000 records.

Any solution in bash or perl would be appreciated too.

Upvotes: 3

Views: 2336

Answers (4)

John B
John B

Reputation: 3646

You could use an all Bash solution.

while IFS=\; read _ stored; do
    string+=($stored)
done < lookup_file
ref=0
while IFS=\; read date _ data1 data2; do
    echo $date ${string[$ref]} $data1 $data2
    ((ref++))
done < data_file

This stores the targeted strings from the lookup file in an array and references them as it reads from the data file.

Upvotes: 0

nwk
nwk

Reputation: 4050

awk has "arrays" (which actually function like hashes/dictionaries) that work quite well for this.

awk 'BEGIN { FS = ";"; OFS = " " }
     {
         if (NR == FNR)
             values[$1] = $2
         else
             print $1, values[$2], $3, $4
     }' lookup data

Upvotes: 0

twalberg
twalberg

Reputation: 62369

This is what join is for, although it does require the two input files to be sorted on the field you want to match on:

sort -t\; -k2,2 datafile.txt > datafile.tmp
sort -t\; -k1,1 lookup.txt > lookup.tmp
join -t\; -1 2 -2 1 -o 1.1,2.2,1.3,1.4 datafile.tmp lookup.tmp | tr ';' ' '

If you're using bash, you could combine that all into one line and skip the temporary files:

join -t\; -1 2 -2 1 -o 1.1,2.2,1.3,1.4 <(sort -t\; -k2,2 datafile.txt) <(sort -t\; -k1,1 lookup.txt) | tr ';' ' '

Upvotes: 2

jaypal singh
jaypal singh

Reputation: 77065

This should work for you:

awk -F';' 'NR==FNR{a[$1]=$2;next}{$2=a[$2]}1' lookup data
  • Set the input field separator to ;
  • Run through lookup file, creating an array a with key of column 1 and storing column 2 as value
  • Once look up file is loaded in memory, substitute the second column of data file with array value.

Upvotes: 7

Related Questions