joshu
joshu

Reputation: 463

Regular Expressions in AWK

I am trying to parse the following input using awk patterns:

Smith, Jim 12.34

12.34 Jim Smith

I have a pattern checking to see if the first field contains an alpha character the second field contains an alpha character and the third contains a number and a second pattern checking for the second case like so:

$1 ~ /[A-Za-z]/ && $2 ~ /[A-Za-z]/ && $3 ~ /[0-9]/{
do fun things with record
}
$3 ~ /[A-Za-z]/ && $2 ~ /[A-Za-z]/ && $1 ~ /[0-9]/
{
this is the second form of the record
}

however, my program appears to be passing both checks and executing both actions. I have been trying to figure out where I am messing up but the same thing keeps happening. Any points in the right direction is much appreciated. I know there are tons of ways to do this. A few of which I have found, but I would like to know specifically what I am doing wrong here.

I'm running CentOS 7 with awk:

gawk --version
GNU Awk 4.0.2

Upvotes: 3

Views: 889

Answers (2)

matz
matz

Reputation: 656

The problem is the newline before the opening braces after the second pattern. This will work as expected:

$1 ~ /[A-Za-z]/ && $2 ~ /[A-Za-z]/ && $3 ~ /[0-9]/{
 print "do fun things with record"
}
$3 ~ /[A-Za-z]/ && $2 ~ /[A-Za-z]/ && $1 ~ /[0-9]/{ # NO newline here
 print "this is the second form of the record"
}

Explanation: An AWK program consists of a sequence of pairs pattern { action }, where either the pattern or the action can be omitted. Adding a newline between pattern and action will make awk parse that as a pattern with no action, followed by an action without pattern (i.e, an action that is executed unconditionally).

Bottomline: stick to Egyptian Brackets in AWK.

Upvotes: 5

karakfa
karakfa

Reputation: 67497

If your fields include both alpha and numerical values it will pass both tests. For example.

$ echo "James007" | awk '/[a-zA-Z]/{print "alpha"} /[0-9]/{print "number"}'

will print both. If you want to restrict to only alpha and number you can do this

$ echo "James 007" | awk '$1~/^[a-zA-Z]+$/{print "alpha"} $2~/^[0-9]+$/{print "number"}'

Upvotes: 1

Related Questions