Registered User
Registered User

Reputation: 265

awk pattern matching with if

I'm trying to multiply field $2 either by .75 or .1

I have this data:

Disputed,279
Processed,12112
Uncollected NSF,4732
Declined,14
Invalid / Closed Account,3022

Awk statement:

#!/usr/local/bin/gawk -f

BEGIN { FPAT="([^,]*)|(\"[^\"]+\")"; FS=OFS=","; OFMT="%.2f"; }

{
        if ($1  "/Disputed|Uncollected|Invalid/")
                $3 = $2 * .75
        else 
                if ($1 ~  "/Processed|Declined/")
                $3 = $2 * 0.10
        print
}

Expected output:

Disputed,279,209.25
Processed,12112,1211.2
Uncollected NSF,4732,3549
Declined,14,1.4
Invalid / Closed Account,3022,2266.5

Current results:

Disputed,279,209.25
Processed,12112,9084
Uncollected NSF,4732,3549
Declined,14,10.5
Invalid / Closed Account,3022,2266.5

These are multiplied by .75: Disputed, Uncollected NSF and Invalid / Closed Account

These are multiplied by .1: Processed and Declined

what's causing all records to be multiplied by .75?

edit: this is my working solution...

 #!/usr/local/bin/gawk -f

BEGIN {
     FPAT="([^,]*)|(\"[^\"]+\")"
     FS=OFS=","
     OFMT="%.2f"
     print "status","acct type","count","amount"
}

NF>1 {
     $4=$3 * ($1 ~ /Processed|Declined/ ? 0.10 : 0.75 )
     print
     trans+=$3
     fee+=$4
}

END {
     printf "------------\n"
     print "# of transactions: " trans
     print "processing fee: " fee
}

Yes, there's four fields. $2 is a hidden special field!

status,acct type,count,amount
Processed,Savings,502,50.2
Uncollected NSF,Checking,4299,3224.25
Disputed,Checking,263,197.25
Processed,Checking,11610,1161
Uncollected NSF,Savings,433,324.75
Declined,Checking,14,1.4
Invalid / Closed Account,Checking,2868,2151
Disputed,Savings,16,12
Invalid / Closed Account,Savings,154,115.5
------------
# of transactions: 20159
processing fee: 7237.35

Upvotes: 1

Views: 476

Answers (3)

Ed Morton
Ed Morton

Reputation: 203532

The way to write your code in awk would be with a ternary expression, e.g.:

$ awk 'BEGIN{FS=OFS=","} {print $0, $2 * ($1 ~ /Processed|Declined/ ? 0.10 : 0.75)}' file
Disputed,279,209.25
Processed,12112,1211.2
Uncollected NSF,4732,3549
Declined,14,1.4
Invalid / Closed Account,3022,2266.5

Note that regexp constants are delimited by / (see http://www.gnu.org/software/gawk/manual/gawk.html#Regexp) but awk can construct dynamic regexps from variables and/or string constants (see http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps) so when you wrote:

"/Processed|Declined/"

in a context appropriate for a dynamic regexp ($1 ~ <regexp>), awk constructed a regexp from it as:

`/Processed` OR `Declined/`

(note the literal / chars as part of the regexp terms) instead of what you wanted:

`Processed` OR `Declined`

You can see that effect here:

$ echo 'abc' | awk '$0 ~ /b|x/'
abc
$ echo 'abc' | awk '$0 ~ "/b|x/"'
$ echo 'a/bc' | awk '$0 ~ "/b|x/"'
a/bc

Now, see if you can figure this out:

$ echo 'abc' | awk '$0 ~ "/b|x/"'
$ echo 'abc' | awk '"/b|x/"'
abc

i.e. why the first one prints nothing but the second one prints the input.

Upvotes: 2

ShellFish
ShellFish

Reputation: 4551

Issue

You are missing a matching operator ~. This statement:

 if ($1  "/Disputed|Uncollected|Invalid/")

always evaluates to true because it checks whether the concatenation of $1 with "/Disputed|Uncollected|Invalid/" is not empty — and it isn't.

Try instead:

if ($1 ~ /Disputed|Uncollected|Invalid/)

Examples

You can see this behavior using following one-liners:

$ awk 'BEGIN { if ("" "a") { print "true" } else { print "false" }}'
true
$ awk 'BEGIN { if ("" "") { print "true" } else { print "false" }}'
false
$ awk 'BEGIN { if ("") { print "true" } else { print "false" }}'
false
$ awk 'BEGIN { if (RS FS "a") { print "true" } else { print "false" }}'
true
$ awk 'BEGIN { if (variable) { print "true" } else { print "false" }}'
false
$ awk 'BEGIN { var="0"; if (var) { print "true" } else { print "false" }}'
true

Upvotes: 2

pobrelkey
pobrelkey

Reputation: 5973

As the other poster said, you left out the ~ operator before the first regular expression.

Also, don't include slashes at the start and end of your regular expressions. Either enclose your regular expressions in slashes (as in Perl/Ruby/JavaScript) or in quotes - not both.

if ($1 ~ "Disputed|Uncollected|Invalid")
    $3 = $2 * .75
else
    if ($1 ~  "Processed|Declined")
        $3 = $2 * 0.10
print

Upvotes: 2

Related Questions