Reputation: 13249
In awk, the field (or record) separator FS
(or RS
) can be set as a regular expression.
It works great for getting any individual field, but once you set one these fields, the field seperators are "gone".
echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {$3="z"}1'
a b z d
In this case the output field separator OFS
is per default set as a space.
Unfortunately this kind of statement OFS=FS="[|-]"
is not working, because it sets OFS
as a litteral string.
I understand that it might get tricky for awk to select the output field separator if there are several choices, but in case of no new fields, the current ones could remain.
So, is there an easy way to set OFS
to be the exact same regex as FS
, such that I get this?
echo "a|b-c|d" | awk '... {$3="z"}1'
a|b-z|d
Alternatively, is there a way to capture all separators, in a array for example?
The same question also applies to the record separator RS
(and its associated ORS
)
Upvotes: 6
Views: 1619
Reputation: 289725
As you already mentioned, there is no way to set OFS
dynamically based on the FS
that was used on every case. If the regex was in RS
instead of FS
, you could use RT
(in fact, I just see anubhava's answer does this, nice!).
However, there is another way if you have GNU awk: as seen in column replacement with awk, with retaining the format (Ed Morton's answer), you can use split()
and, specially, its 4th argument. Why? Because it stores the separator between every slice:
gawk 'BEGIN{FS="[|-]"} # set FS
{split($0, a, FS, seps) # split based on FS and ...
# ... store pieces in the array seps()
a[3]="z" # change the 3rd field
for (i=1;i<=NF;i++) # print the data back
printf "%s%s", a[i], seps[i] # keeping the separators
print "" # print a new line
}'
As one-liner:
$ gawk 'BEGIN{FS="[|-]"} {split($0, a, FS, seps); a[3]="z"; for (i=1;i<=NF;i++) printf "%s%s", a[i], seps[i]; print ""}' <<< "a|b-c|d"
a|b-z|d
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array1, the second piece in array2, and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).
Upvotes: 5
Reputation: 785156
awk
rewrites each record using OFS
if you change any filed value using $N=<whatever>
(where N is field number).
Since you're using multiple delimiters in FS
you cannot use OFS=FS
.
If you have gnu awk
then you can use RS
and RT
based solution:
s="a|b-c|d"
awk -v RS='[-|]' 'NR==3{$0="z"} {printf "%s%s", $0, RT}' <<< "$s"
a|b-z|d
Alternatively you can use sed
:
s="a|b-c|d"
sed -E 's/^(([^|-]+[|-]){2})[^|-]+/\1z/' <<< "$s"
a|b-z|d
Upvotes: 3
Reputation: 37404
Since you clearly don't need to work the fields, just process $0 other ways, like below with sub
:
$ echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {sub(/c/,"z")}1'
a|b-z|d
Upvotes: 0