juanschwartz
juanschwartz

Reputation: 91

Order of operations in awk?

I am trying to use awk to do two things. I want to separate a list into three separate lists and convert 1 or 2 columns of each to a regular expression. When I pipe awk to itself, ie select my items in my list and then use awk to do the substitutions, it appends 1s to the list items.

I figure I need to not pipe awk to itself and instead do all of this in a single call to awk.

AH??0*,*,ARRAY RESISTIVITY,RESISTIVITY
AHD*,*,MEASURED DEPTH,REFERENCE
AI*,*,ACOUSTIC IMPEDANCE COMPRESSIONAL,GEOPHYSICAL SYNTHETICS
AI_AVG_HOR_SIG,*,ACOUSTIC IMPEDANCE,ACOUSTIC
*,FOO,BAR,BLEH

List one would be lines like line 4, with no wildcards in column one, replacing wildcards in column 2.

List two would be for lines 1,2 and 3 in a separate list and will need to do substitutions on columns 1 and 2.

Lastly, I need to do a similar thing for line 5 in a separate list.

I am able to get this lists doing this.

Line 4: awk -F \, '$1!~/([\*\?])/' file.txt
Lines 1-3: awk -F \, '$1~/([\*\?])/' file.txt
Line 5: awk -F \, '$1~/^\*$/' file.txt

My subs are * => .* and ? => [0-9].

When I attempt to use gsub like this awk -F \, 'gsub(/\*/,".*",$2) $1!~/([\*\?])/' OFS=, file.txt, the list comes back funky with unexpected results. I feel as though there is a fundamental thing I don't understand about awk with regard to stacking operations.

Halp!

Upvotes: 0

Views: 461

Answers (1)

JJoao
JJoao

Reputation: 5347

What I write here is not the solution of your question. It is just an exercise of reorganization of your versions... (for you to complete :). Some of @Etan wise suggestions are still missing. (Stylistic concerns can save us lots of time).

awk (or any one-liner solutions) gets confusing in it exceeds some 30 chars. Quotes, etc become difficult.

You can (should?) write it in a file (a.awk) with proper indentation, comments, vertical symmetries:

#!/usr/bin/gawk -f

BEGIN                          { FS="," ; OFS=","     }

$1 ~ /[\*\?]/ && $1 !~ /^\*$/  { gsub(/\*/, ".*"   ,$1 );
                                 gsub(/\?/, "[0-9]",$1 );
                                 gsub(/\*/, ".*"   ,$2 );
                                 print; }

and use it as awk -f a.awk inputfile

The current behavior is:

echo 'AH??0*,*,ARRAY RESISTIVITY,RESISTIVITY
AHD*,*,MEASURED DEPTH,REFERENCE
AI*,*,ACOUSTIC IMPEDANCE COMPRESSIONAL,GEOPHYSICAL SYNTHETICS
AI_AVG_HOR_SIG,*,ACOUSTIC IMPEDANCE,ACOUSTIC
*,FOO,BAR,BLEH' | awk -f /tmp/a1

AH[0-9][0-9]0.*,.*,ARRAY RESISTIVITY,RESISTIVITY
AHD.*,.*,MEASURED DEPTH,REFERENCE
AI.*,.*,ACOUSTIC IMPEDANCE COMPRESSIONAL,GEOPHYSICAL SYNTHETICS

Upvotes: 1

Related Questions