Santhosh
Santhosh

Reputation: 11844

awk: length() function not working properly

I am using match(string,/regex/,array) or strip(string,array,/regex/) in awk and i want to know the lenght of the array

Here length() works

awk 'BEGIN{a[1]="sometext";print length(a)}'
output: 1

Here its not working

awk 'BEGIN{
    str="some text simple test";
    match(str,/(test)/,a);
    print "a[1]: "a[1];
    print length(a)
}'
output:
a[1]: test
6

Here its strange that it increment the length

awk 'BEGIN{
    str="some text simple test";
    match(str,/(test)/,a);
    print "a[1]: "a[1];
    print "a[2]: "a[2];
    print length(a)
}'
output:
a[1]: test
a[2]: 
7

why length() is not working and giving wierd output

Mostly found the reason: But not able to undestand even though a[2] does not exist, but its creating a new one. Ideally it should not create a variable if does not exist. Its the commented line

$ awk 'BEGIN{
        str="some text simple test";
        match(str,/simple (test)/,a);
        print "a[0]: "a[0];
        print "a[1]: "a[1];
        print "a[2]: "a[2]; # a[2] does not exist, but its creating a new one. Ideally it should not create a variable if does not exist
        print "length(a): "length(a)
    k = 0
    for(i in a){
      print "["i"]: "a[i]
      k++
    }
    print "length: "k
    print "RLENGTH::"RLENGTH
    print "RSTART::"RSTART
}'

OUTPUT:
a[0]: simple test
a[1]: test
a[2]: 
length(a): 7
[0start]: 11
[0length]: 11
[1start]: 18
[1length]: 4
[0]: simple test
[1]: test
[2]: 
length: 7
RLENGTH::11
RSTART::11

Upvotes: 0

Views: 2024

Answers (2)

oguz ismail
oguz ismail

Reputation: 50815

Firstly, wrt Here its strange that it increment the length, it's not strange actually, just referencing array[subscript] is almost the same as array[subscript]="".

Wrt why length() is not working and giving wierd output, it is working though. Gawk manual says:

match(s, r [, a])

...

If array a is provided, a is cleared and then elements 1 through n are filled with the portions of s that match the corresponding parenthesized subexpression in r. The zero'th element of a contains the portion of s matched by the entire regular expression r. Subscripts a[n, "start"], and a[n, "length"] provide the starting index in the string and length respectively, of each matching substring.

So, length counts a[0,"start"],a[0,"length"] etc. as well.

Upvotes: 3

steffen
steffen

Reputation: 17058

awk's match() sets the startindex and endindex for each matched group. As usual, element #0 is set to the whole pattern. So you'll get three items for group 0 (whole string) and group 1 (matched group) contains start, length and values. Check the output of this:

gawk 'BEGIN{str="some text simple test"; match(str, /(test)/, a); for (i in a) print i":"a[i]}'
0start:18
0length:4
1start:18
1length:4
0:test
1:test

Upvotes: 4

Related Questions