Reputation: 11844
I am using match(string,/regex/,array)
or strip(string,array,/regex/)
in awk and i want to know the lenght of the array
Here length() works
awk 'BEGIN{a[1]="sometext";print length(a)}'
output: 1
Here its not working
awk 'BEGIN{
str="some text simple test";
match(str,/(test)/,a);
print "a[1]: "a[1];
print length(a)
}'
output:
a[1]: test
6
Here its strange that it increment the length
awk 'BEGIN{
str="some text simple test";
match(str,/(test)/,a);
print "a[1]: "a[1];
print "a[2]: "a[2];
print length(a)
}'
output:
a[1]: test
a[2]:
7
why length() is not working and giving wierd output
Mostly found the reason: But not able to undestand even though a[2] does not exist, but its creating a new one. Ideally it should not create a variable if does not exist. Its the commented line
$ awk 'BEGIN{
str="some text simple test";
match(str,/simple (test)/,a);
print "a[0]: "a[0];
print "a[1]: "a[1];
print "a[2]: "a[2]; # a[2] does not exist, but its creating a new one. Ideally it should not create a variable if does not exist
print "length(a): "length(a)
k = 0
for(i in a){
print "["i"]: "a[i]
k++
}
print "length: "k
print "RLENGTH::"RLENGTH
print "RSTART::"RSTART
}'
OUTPUT:
a[0]: simple test
a[1]: test
a[2]:
length(a): 7
[0start]: 11
[0length]: 11
[1start]: 18
[1length]: 4
[0]: simple test
[1]: test
[2]:
length: 7
RLENGTH::11
RSTART::11
Upvotes: 0
Views: 2024
Reputation: 50815
Firstly, wrt Here its strange that it increment the length, it's not strange actually, just referencing array[subscript]
is almost the same as array[subscript]=""
.
Wrt why length() is not working and giving wierd output, it is working though. Gawk manual says:
match(s, r [, a])
...
If array
a
is provided,a
is cleared and then elements1
throughn
are filled with the portions ofs
that match the corresponding parenthesized subexpression inr
. The zero'th element ofa
contains the portion ofs
matched by the entire regular expressionr
. Subscriptsa[n, "start"]
, anda[n, "length"]
provide the starting index in the string and length respectively, of each matching substring.
So, length
counts a[0,"start"]
,a[0,"length"]
etc. as well.
Upvotes: 3
Reputation: 17058
awk's match() sets the startindex and endindex for each matched group. As usual, element #0 is set to the whole pattern. So you'll get three items for group 0 (whole string) and group 1 (matched group) contains start, length and values. Check the output of this:
gawk 'BEGIN{str="some text simple test"; match(str, /(test)/, a); for (i in a) print i":"a[i]}'
0start:18
0length:4
1start:18
1length:4
0:test
1:test
Upvotes: 4