Reputation: 6737
Example: n=3, the input is
foo bar baz
a b c d e f g h
12 34 5 678
the output should be:
13
5
8
Upvotes: 2
Views: 63
Reputation: 204258
This will work for any field separator, including multi-char regexp, using GNU awk for the 4th arg to split():
$ cat tst.awk
{
split($0,flds,FS,seps)
indent = 1
for (i=0; i<n; i++) {
indent += length(flds[i] seps[i])
}
print indent
}
$ awk -v n=3 -f tst.awk file
13
5
8
or with multi-char strings of .+.
or .-.
between fields:
$ cat file2
foo.+.bar.-.baz
a.+.b.-.c.+.d.-.e.+.f.-.g.+.h
12.-.34.+.5.-.678
$ awk -F'[.][+-][.]' -v n=3 -f tst.awk file2
13
9
11
Note that since we're using FS
as an argument to split()
it will be treated as a dynamic regexp (i.e. one stored in a string) and so any backslashes in the FS
would need to be doubled.
Also note that we start the counting loop at 0, not 1, because with the default FS any leading white space before flds[1] (i.e. before $1) is stored in seps[0]. flds[0] will always be empty and for non-default FS seps[0] will also be empty to no harm done including their length in all cases.
Upvotes: 1
Reputation: 74685
You can use match
to do this:
$ awk 'match($0, /[[:blank:]]*([^[:blank:]]+[[:blank:]]+){2}/) {
print RLENGTH + 1
}' file
13
5
8
Or using a parameter with a dynamic regex:
$ awk -v n=3 'match($0, "[[:blank:]]*([^[:blank:]]+[[:blank:]]+){" n - 1 "}") {
print RLENGTH + 1
}' file
13
5
8
This searches for optional leading blanks (spaces or tabs), followed by something non-blank, followed by something blank, n - 1
times, where n
is the word number. match
sets the variables RSTART
and RLENGTH
(in this case, RSTART == 1
). RLENGTH
gives the length of the match, so one character after that is where the nth word starts.
Since you mentioned GNU awk, you can shorten things by using \s
(which is actually [[:space:]]
, but that works here too) and non-space \S
:
$ awk -v n=3 'match($0, "\\s*(\\S+\\s+){" n - 1 "}") { print RLENGTH + 1 }' file
In dynamic regex, the backslashes themselves need to be escaped.
Upvotes: 3
Reputation: 37454
The simplest would probably be:
$ awk -v n=3 '{print index($0,$n)}' file
13
5
8
but it's error prone, and would require some checking. $n
is the third word (or field separated by FS
the field separator). index
returns the position in characters where that occurrence begins. If the FS
is default (space and then some) you'd probably want to start with a space and add one to the position:
$ awk -v n=3 '{print 1 + index($0," " $n)}' file
13
5
8
... as pointed out in the comments is also error prone to n=1
or if the nth word matches the beginning of a prior word.
We could use GNU awk's split
's seps feature:
$ awk -v n=3 '{
s=1 # reset s to 1
split($0,a,/ +/,b) # split to a and separators to b
for(i=1;i<n;i++) # iterate to n
s+=length(a[i] b[i]) # sum the lengths of a b
print s # print the position
}' file
13
5
8
Upvotes: 3