user3676305
user3676305

Reputation: 65

UNIX AWK script - memory exhausted

I have an Input CSV file which looks like this:

123456,ABC,A,,,
123457,DEF,A,H,,
1234568,GHI,,H,,
111111,AAA,A,,,
12345699,XYZ,A,H,,

Now, I have an AWK script containing below lines with multiple IF conditions:

BEGIN { FS=","}
{ 
variable=$1.","$2;
if(variable ~ /^123456.+,ABC/) print "P," $0; else
if(variable ~ /^123457.+,DEF/) print "P," $0; else
if(variable ~ /^123458.+,GHI/) print "R," $0; else
if(variable ~ /^1234599.+,XYZ/) print "P," $0; else print "U" ","  $0;} 
END { }

After running this AWK script on my input file, I get the below output:

P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
P,12345699,XYZ,A,H,,

Everything was running fine till now, but when I had to add more IF conditions to this AWK script (around 3500) it throws a 'memory exhausted' error:

awk: script.awk:1259: if(variable ~ /^123311.+,AB23/) print "P," $0; else
awk: script.awk:1259:                                              ^ memory exhausted

Now the interesting part: First, the memory exhausted error comes always at line 1259 and second, when I remove the number of IF conditions after line 1259 (inclusive 1259) then the script runs smoothly again. Is there any limit on the number of IF conditions inside a AWK/GAWK script?

The AWK version which I am using is :

GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.3, GNU MP 6.1.0)

Upvotes: 4

Views: 598

Answers (3)

Ed Morton
Ed Morton

Reputation: 203712

I doubt if there's a limit on how many stand-alone ifs there are in your code but maybe there's a limit on if-elses since that's essentially just one long statement.

Try this to see if you still have a problem or not:

BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2 }
variable ~ /^123456.+,ABC/  { print "P", $0; next }
variable ~ /^123457.+,DEF/  { print "P", $0; next }
variable ~ /^123458.+,GHI/  { print "R", $0; next }
variable ~ /^1234599.+,XYZ/ { print "P", $0; next }
{ print "U",  $0 } 

I also cleaned up a few other things that should have no impact on your problem.

If you can't do the above due to needing to do something else later in your script then:

BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2; f=0 }
!f && variable ~ /^123456.+,ABC/  { print "P", $0; f=1 }
!f && variable ~ /^123457.+,DEF/  { print "P", $0; f=1 }
!f && variable ~ /^123458.+,GHI/  { print "R", $0; f=1 }
!f && variable ~ /^1234599.+,XYZ/ { print "P", $0; f=1 }
!f { print "U",  $0 } 

would be another way to get ride of the elses.

Note that I'm not suggesting any of this is a reasonable approach to whatever it is you're trying to do but I don't know enough about what you're really trying to do to suggest another approach so the above is just focused on helping you syntactically get around the error message you're getting.

Upvotes: 2

James Brown
James Brown

Reputation: 37414

Don't know if there is an if limit in GNU awk but don't put so many ifs in your code, instead solve it with content, a bit like this (it's just a quick draft):

$ cat rules   # put your logic here
P,123456,ABC
P,123457,DEF
R,1234568,GHI

The code:

$ awk '
BEGIN { FS=OFS="," }                       
NR==FNR {                                  # read in the rules file
    a[$2","$3]=$1                          # and hash it
    next
}
{                                          # read the input file
    print ($1","$2 in a?a[$1","$2]:"U"),$0 # read code from a hash and it or U if not found
}' rules input                             # mind the order
P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
U,12345699,XYZ,A,H,,

Edit:

If you use GNU awk, store only the beginnings of the $1 and $2 to a 2D array, you can achieve something like that:

$ cat rules   # put your logic here, notice 1st and 3rd
P,123456,ABC
P,123457,DEF
R,123456,GHI

The code:

$ awk '
BEGIN { FS=OFS="," }
NR==FNR {
    a[$2][$3]=$1
    next
}
{
    p=substr($1,1,6)
    print (p in a && $2 in a[p] ? a[p][$2] : "U"),$0
}' rules input
P,123456,ABC,A,,,    # matches 1st record in rules file
P,123457,DEF,A,H,,   # 2nd
R,1234568,GHI,,H,,   # 3 rd
U,111111,AAA,A,,,    # no match
U,12345699,XYZ,A,H,, # 123456 would match but XYZ wont

Upvotes: 2

Abhinandan prasad
Abhinandan prasad

Reputation: 1079

Try This:

awk -F',' '{if($1$2 ~ /^123456+ABC/ || $1$2 ~ /^123457+DEF/ || $1$2 ~ /^12345699+XYZ/ || $1$2 ~ /^123311+AB23/){print "P," $0;} else if($1$2 ~ /^1234568+GHI/){print "R," $0;} else{ print "U" ","  $0}}' file

Upvotes: -1

Related Questions