Reputation: 65
I have an Input CSV file which looks like this:
123456,ABC,A,,,
123457,DEF,A,H,,
1234568,GHI,,H,,
111111,AAA,A,,,
12345699,XYZ,A,H,,
Now, I have an AWK script containing below lines with multiple IF conditions:
BEGIN { FS=","}
{
variable=$1.","$2;
if(variable ~ /^123456.+,ABC/) print "P," $0; else
if(variable ~ /^123457.+,DEF/) print "P," $0; else
if(variable ~ /^123458.+,GHI/) print "R," $0; else
if(variable ~ /^1234599.+,XYZ/) print "P," $0; else print "U" "," $0;}
END { }
After running this AWK script on my input file, I get the below output:
P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
P,12345699,XYZ,A,H,,
Everything was running fine till now, but when I had to add more IF conditions to this AWK script (around 3500) it throws a 'memory exhausted' error:
awk: script.awk:1259: if(variable ~ /^123311.+,AB23/) print "P," $0; else
awk: script.awk:1259: ^ memory exhausted
Now the interesting part: First, the memory exhausted error comes always at line 1259 and second, when I remove the number of IF conditions after line 1259 (inclusive 1259) then the script runs smoothly again. Is there any limit on the number of IF conditions inside a AWK/GAWK script?
The AWK version which I am using is :
GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.3, GNU MP 6.1.0)
Upvotes: 4
Views: 598
Reputation: 203712
I doubt if there's a limit on how many stand-alone if
s there are in your code but maybe there's a limit on if-else
s since that's essentially just one long statement.
Try this to see if you still have a problem or not:
BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2 }
variable ~ /^123456.+,ABC/ { print "P", $0; next }
variable ~ /^123457.+,DEF/ { print "P", $0; next }
variable ~ /^123458.+,GHI/ { print "R", $0; next }
variable ~ /^1234599.+,XYZ/ { print "P", $0; next }
{ print "U", $0 }
I also cleaned up a few other things that should have no impact on your problem.
If you can't do the above due to needing to do something else later in your script then:
BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2; f=0 }
!f && variable ~ /^123456.+,ABC/ { print "P", $0; f=1 }
!f && variable ~ /^123457.+,DEF/ { print "P", $0; f=1 }
!f && variable ~ /^123458.+,GHI/ { print "R", $0; f=1 }
!f && variable ~ /^1234599.+,XYZ/ { print "P", $0; f=1 }
!f { print "U", $0 }
would be another way to get ride of the else
s.
Note that I'm not suggesting any of this is a reasonable approach to whatever it is you're trying to do but I don't know enough about what you're really trying to do to suggest another approach so the above is just focused on helping you syntactically get around the error message you're getting.
Upvotes: 2
Reputation: 37414
Don't know if there is an if
limit in GNU awk but don't put so many if
s in your code, instead solve it with content, a bit like this (it's just a quick draft):
$ cat rules # put your logic here
P,123456,ABC
P,123457,DEF
R,1234568,GHI
The code:
$ awk '
BEGIN { FS=OFS="," }
NR==FNR { # read in the rules file
a[$2","$3]=$1 # and hash it
next
}
{ # read the input file
print ($1","$2 in a?a[$1","$2]:"U"),$0 # read code from a hash and it or U if not found
}' rules input # mind the order
P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
U,12345699,XYZ,A,H,,
Edit:
If you use GNU awk, store only the beginnings of the $1
and $2
to a 2D array, you can achieve something like that:
$ cat rules # put your logic here, notice 1st and 3rd
P,123456,ABC
P,123457,DEF
R,123456,GHI
The code:
$ awk '
BEGIN { FS=OFS="," }
NR==FNR {
a[$2][$3]=$1
next
}
{
p=substr($1,1,6)
print (p in a && $2 in a[p] ? a[p][$2] : "U"),$0
}' rules input
P,123456,ABC,A,,, # matches 1st record in rules file
P,123457,DEF,A,H,, # 2nd
R,1234568,GHI,,H,, # 3 rd
U,111111,AAA,A,,, # no match
U,12345699,XYZ,A,H,, # 123456 would match but XYZ wont
Upvotes: 2
Reputation: 1079
Try This:
awk -F',' '{if($1$2 ~ /^123456+ABC/ || $1$2 ~ /^123457+DEF/ || $1$2 ~ /^12345699+XYZ/ || $1$2 ~ /^123311+AB23/){print "P," $0;} else if($1$2 ~ /^1234568+GHI/){print "R," $0;} else{ print "U" "," $0}}' file
Upvotes: -1