ericj
ericj

Reputation: 2301

How to pass a regular expression to a function in AWK

I do not know how to pass an regular expression as an argument to a function.

If I pass a string, it is OK,

I have the following awk file,

#!/usr/bin/awk -f

function find(name){
    for(i=0;i<NF;i++)if($(i+1)~name)print $(i+1)
}

{
    find("mysql")
}    

I do something like

$ ./fct.awk <(echo "$str")

This works OK.

But when I call in the awk file,

{
    find(/mysql/)
}  

This does not work.

What am I doing wrong?

Thanks,

Eric J.

Upvotes: 4

Views: 1072

Answers (4)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2875

use quotations, treat them as a string. this way it works for mawk, mawk2, and gnu-gawk. but you'll also need to double the backslashes since making them strings will eat away one of them right off the bat.

in your examplem just find("mysql") will suffice.

you can actually get it to pass arbitrary regex as you wish, and not be confined to just gnu-gawk, as long as you're willing to make them strings not the @/../ syntax others have mentioned. This is where the # of backslashes make a difference.

You can even make regex out of arbitrary bytes too, preferably via octal codes. if you do "\342\234\234" as a regex, the system will convert that into actual bytes in the regex before matching.

While there's nothing with that approach, if you wanna be 100% safe and prefer not having arbitrary bytes flying around , write it as

"[\\342][\\234][\\234]"  ----> ✜

Once initially read by awk to create an internal representation, it'll look like this :

[\342][\234][\234]

which will still match the identical objects you desire (in this case, some sort of cross-looking dingbat). This will spit out annoying warnings in unicode-aware mode of gawk due to attempting to enclose non-ASCII bytes directly into square brackets. For that use case,

"\\342\\234\\234" ------(eqv to )--->  /\342\234\234/

will keep gawk happy and quiet. Lately I've been filling the gaps in my own codes and write regex that can mimic all the Unicode-script classes that perl enjoys.

Upvotes: 0

oliv
oliv

Reputation: 13259

If you use GNU awk, you can use regular expression as user defined function parameter.
You have to define your regex as @/.../.

In your example, you would use it like this:

function find(regex){
    for(i=1;i<=NF;i++)
            if($i ~ regex)
                    print $i
}

{
    find(@/mysql/)
}    

It's called strongly type regexp constant and it's available since GNU awk version 4.2 (Oct 2017).

Example here.

Upvotes: 1

Kent
Kent

Reputation: 195209

you cannot (should not) pass regex constant to a user-defined function. you have to use dynamic regex in this case. like find("mysql")

if you do find(/mysql/), what does awk do is : find($0~/mysql/) so it pass a 0 or 1 to your find(..) function.

see this question for detail.

awk variable assignment statement explanation needed

also http://www.gnu.org/software/gawk/manual/gawk.html#Using-Constant-Regexps

section: 6.1.2 Using Regular Expression Constants

Upvotes: 5

Karoly Horvath
Karoly Horvath

Reputation: 96286

warning: regexp constant for parameter #1 yields boolean value

The regex gets evaluated (matching against $0) before it's passed to the function. You have to use strings.

Note: make sure you do proper escaping: http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps

Upvotes: 4

Related Questions