Reputation: 38701
As far as I can see, if I want to split a string with regex, and keep the delimiters in Perl, JavsScript or PHP, I should use capturing parentheses / group in the regex; e.g. in Perl (where I want to split at a single digit and right parenthesis):
$ echo -e "123.123 1) 234.234\n345.345 0) 456.456" \
| perl -ne 'print join("--", split(/(\d\))/,$_));'
123.123 --1)-- 234.234
345.345 --0)-- 456.456
I'm trying the same trick in awk
, but it doesn't look like it works (as in, the delimiters are still "eaten", even if a capturing group/parentheses are used):
$ echo -e "123.123 1) 234.234\n345.345 0) 456.456" \
| awk '{print; n=split($0,a,/([0-9]\))/);for(i=1;i<=n;i++){print i,a[i];}}'
123.123 1) 234.234
1 123.123
2 234.234
345.345 0) 456.456
1 345.345
2 456.456
Can awk
be forced to keep the delimiter matches in the array which is the result of split?
Upvotes: 0
Views: 4554
Reputation: 204638
As @konsolebox mentioned you can use split() with newer gawk versions to save field separator values. You could also take a look at FPAT and patsplit(). Another alternative would be to set the RS to your current FS and then use RT.
Having said that, I don't understand why you're thinking of a solution involving field separators when you could solve the problem you posted with just a gensub() in gawk:
$ echo -e "123.123 1) 234.234\n345.345 0) 456.456" |
gawk '{print gensub(/[[:digit:]])/,"--&--","")}'
123.123 --1)-- 234.234
345.345 --0)-- 456.456
If there's a different problem you're really trying to solve that'd require remembering the FS values, let us know and we can point you in the right direction.
Upvotes: 1
Reputation: 75618
You can use split()
in gawk e.g
echo -e "123.123 1) 234.234\n345.345 0) 456.456" |
gawk '{
nf = split($0, a, /[0-9]\)/, seps)
for (i = 1; i < nf; ++i) printf "%s--%s--", a[i], seps[i]
print a[i]
}'
Output:
123.123 --1)-- 234.234
345.345 --0)-- 456.456
The version of the function in GNU awk (gawk) accepts another optional array name argument in which if present saves the matched separators to the array.
As noted in Gawk's manual:
split(s, a [, r [, seps] ])
Split the string s into the array a and the separators array seps on the regular expression r, and return the number of
fields. If r is omitted, FS is used instead. The arrays a and seps are cleared first. seps[i] is the field separator
matched by r between a[i] and a[i+1]. If r is a single space, then leading whitespace in s goes into the extra array element
seps[0] and trailing whitespace goes into the extra array element seps[n], where n is the return value of split(s, a, r,
seps). Splitting behaves identically to field splitting, described above.
Upvotes: 4