AWK: Edit values in columns when a pattern is met

Question

I have a tab separated csv file with 3500 lines. On this file, I have to edit the values in columns 5 and 6, depending on the values of columns 3, 4 and 5.

Rules:

We just have to edit lines where the value in column 5 does not exceed "3500".

As you can see the value in column 6 is the concanetacion of columns 5 and 7.

If the values in columns 3 and 4 are:

"AR" "ES" => add 10000 to the value in column 5.
"BE" "FR" => add 20000 to the value in column 5.
"BE" "NL" => add 30000 to the value in column 5.
"BR" "PT" => add 40000 to the value in column 5.
"DK" "DA" => add 50000 to the value in column 5.

The file contains

voc_sales_ren   "QSA"   "AR"    "ES"    "100"   "100reto"   "reto"
voc_sales_ren   "QSA"   "AR"    "ES"    "11"    "11reto"    "reto"
voc_sales_ren   "QSA"   "AR"    "ES"    "9654236"   "9654236expe"   "expe"
voc_sales_ren   "QSA"   "BE"    "FR"    "1078"  "1078reto"  "reto"
voc_sales_ren   "QSA"   "BE"    "NL"    "027460652" "027460652expe" "expe"
voc_sales_ren   "QSA"   "BE"    "NL"    "027460652" "027460652reto" "reto"
voc_sales_ren   "QSA"   "BR"    "PT"    "1045"  "1045reto"  "reto"
voc_sales_ren   "QSA"   "BR"    "PT"    "1046"  "1046reto"  "reto"
voc_sales_ren   "QSA"   "BR"    "PT"    "1094"  "1094reto"  "reto"
voc_sales_ren   "QSA"   "DK"    "DA"    "017187074" "017187074reto" "reto"
voc_sales_ren   "QSA"   "DK"    "DA"    "017187090" "017187090reto" "reto"
voc_sales_ren   "QVN"   "DK"    "DA"    "384"   "384reto"   "reto"
voc_sales_ren   "QVN"   "DK"    "DA"    "387"   "387reto"   "reto"
voc_sales_ren   "QVN"   "BE"    "NL"    "1019"  "1019reto"  "reto"

Expected output:

voc_sales_ren   "QSA"   "AR"    "ES"    "10100" "10100reto" "reto"
voc_sales_ren   "QSA"   "AR"    "ES"    "10011" "10011reto" "reto"
voc_sales_ren   "QSA"   "AR"    "ES"    "9654236"   "9654236expe"   "expe"
voc_sales_ren   "QSA"   "BE"    "FR"    "21078" "21078reto" "reto"
voc_sales_ren   "QSA"   "BE"    "NL"    "027460652" "027460652expe" "expe"
voc_sales_ren   "QSA"   "BE"    "NL"    "027460652" "027460652reto" "reto"
voc_sales_ren   "QSA"   "BR"    "PT"    "41045" "41045reto" "reto"
voc_sales_ren   "QSA"   "BR"    "PT"    "41046" "41046reto" "reto"
voc_sales_ren   "QSA"   "BR"    "PT"    "41094" "41094reto" "reto"
voc_sales_ren   "QSA"   "DK"    "DA"    "017187074" "017187074reto" "reto"
voc_sales_ren   "QSA"   "DK"    "DA"    "017187090" "017187090reto" "reto"
voc_sales_ren   "QVN"   "DK"    "DA"    "50384" "50384reto" "reto"
voc_sales_ren   "QVN"   "DK"    "DA"    "50387" "50387reto" "reto"
voc_sales_ren   "QVN"   "BE"    "NL"    "301019"    "301019reto"    "reto"

Andrew · Accepted Answer

Give this a shot...

#!/usr/bin/awk -f

BEGIN {
        to_add["\"AR\"" "\"ES\""] = 10000
        to_add["\"BE\"" "\"FR\""] = 20000
        to_add["\"BE\"" "\"NL\""] = 30000
        to_add["\"BR\"" "\"PT\""] = 40000
        to_add["\"DK\"" "\"DA\""] = 50000
        OFS = "   "
    }

    {
        split($5, num, "\"")
    }

num[2] > 3500   {
        print
    }

num[2] <= 3500 {
        split($7, cat, "\"")
        $5 = num[2] + to_add[$3 $4]
        $6 = "\"" $5 cat[2] "\""
        $5 = "\"" $5 "\""
        $5 = $5
        print
    }

I'm not sure if the spacing in your file is essential, so I've made no particular attempt to preserve it, just setting everything to 3 spaces. The output is:

voc_sales_ren   "QSA"   "AR"   "ES"   "10100"   "10100reto"   "reto"
voc_sales_ren   "QSA"   "AR"   "ES"   "10011"   "10011reto"   "reto"
voc_sales_ren   "QSA"   "AR"    "ES"    "9654236"   "9654236expe"   "expe"
voc_sales_ren   "QSA"   "BE"   "FR"   "21078"   "21078reto"   "reto"
voc_sales_ren   "QSA"   "BE"    "NL"    "027460652" "027460652expe" "expe"
voc_sales_ren   "QSA"   "BE"    "NL"    "027460652" "027460652reto" "reto"
voc_sales_ren   "QSA"   "BR"   "PT"   "41045"   "41045reto"   "reto"
voc_sales_ren   "QSA"   "BR"   "PT"   "41046"   "41046reto"   "reto"
voc_sales_ren   "QSA"   "BR"   "PT"   "41094"   "41094reto"   "reto"
voc_sales_ren   "QSA"   "DK"    "DA"    "017187074" "017187074reto" "reto"
voc_sales_ren   "QSA"   "DK"    "DA"    "017187090" "017187090reto" "reto"
voc_sales_ren   "QVN"   "DK"   "DA"   "50384"   "50384reto"   "reto"
voc_sales_ren   "QVN"   "DK"   "DA"   "50387"   "50387reto"   "reto"
voc_sales_ren   "QVN"   "BE"   "NL"   "31019"   "31019reto"   "reto"

Here's the deal:

First, it sets up the values that you want to add in the BEGIN clause. They're in an array that will be referenced later.
It splits field 5 on the double-quote character. This is a cheap and easy way to strip off the quotes.
If the numeric values of field 5 is less than or equal to 3500, it performs the changes you wanted. To do this it (a) strips off the quotes from $7, (b) adds to $5 the value set up in BEGIN corresponding to $3 and $4, (c) prepends a double quote and the new value of $5 to the alphabetic part of $7, (d) puts new quotes onto the new value of $5, and (e) rebuilds the line and prints.
If greater than 3500, just print the line.

AWK: Edit values in columns when a pattern is met

Answers (2)

Related Questions