Amit
Amit

Reputation: 126

Reuse patterns in awk program

I want to write a somehow long awk program and therefore make my code more readable and easier to maintain. The first code snippet works but it is hard to read and harder to maintain.

/\(..-av-es\/.*\)/ {
    split($0, arr, /\(..-av-es\/.*\)/)
}

Therefore I would like to define the regex once inside the variable and use the variable. $0 ~ PATTERN {...} works but split($0, arr, PATTERN) doesn't. What exactly am I doing wrong?

BEGIN { PATTERN="\(..-av-es\/.*\)"}

$0 ~ PATTERN {
    split($0, arr, PATTERN)

}

EDIT: I have a file structured like this.

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
abc (fd-av-es/key1) value1sdfsdaff
jjjjjjjjjjjjjjjjjjjjjjjjjjj
(sd-av-es/key2) value2sdfsdaff 

my final goal is having an array of strings "key1:value1" "key2:value2"

This snippet

/\(..-av-es\/.*\)/ {
    split($0, arr, /\(..-av-es\/.*\)/)
    for ( i in arr) {print NR arr[i]}
}

returns which brings me a little closer to value1 and value2

2abc
2 value1afjskhslakjhf
4
4 value2jkalshfkjkl

but

BEGIN { PATTERN="\(..-av-es\/.*\)"}
$0 ~ ES_PATTERN {
    split($0, arr, ES_PATTERN)
    for ( i in arr) {print NR arr[i]}
}

however returns:

2abc (
2
4(
4

Thanks

Upvotes: 1

Views: 170

Answers (1)

Ed Morton
Ed Morton

Reputation: 204184

What you have in your question is a regexp so call them that instead of the highly ambiguous "patterns". See How do I find the text that matches a pattern? for more info on that topic.

You don't need to provide the regexp twice, just do this instead:

split($0, arr, /\(..-av-es\/.*\)/) > 1 {
    ...
}

If for some reason you did want to do what you're trying to do then you should do this with GNU awk for strongly typed regexp constants:

BEGIN {
    regexp = @/\(..-av-es\/.*\)/
}

$0 ~ regexp {
    split($0, arr, regexp)
    ...
}

or with any other awk you're defining a dynamic regexp which is a string that will then get parsed twice by awk, first to turn it into a regexp and then to use it as a regexp, so you need to double the escapes:

BEGIN {
    regexp = "\\(..-av-es\\/.*\\)"
}

$0 ~ regexp {
    split($0, arr, regexp)
    ...
}

See https://www.gnu.org/software/gawk/manual/gawk.html#Using-Constant-Regexps and https://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps for more info on the difference between dynamic regexps, constant regexps, and strongly typed regexp constants.

Upvotes: 5

Related Questions