borrible
borrible

Reputation: 17336

Output field separators in awk after substitution in fields

Is it always the case, after modifying a specific field in awk, that information on the output field separator is lost? What happens if there are multiple field separators and I want them to be recovered?

For example, suppose I have a simple file example that contains:

a:e:i:o:u

If I just run an awk script, which takes account of the input field separator, that prints each line in my file, such as running

awk -F: '{print $0}' example

I will see the original line. If however I modify one of the fields directly, e.g. with

awk -F: '{$2=$2"!"; print $0}' example

I do not get back a modified version of the original line, rather I see the fields separated by the default whitespace separator, i.e:

a e! i o u

I can get back a modified version of the original by specifying OFS, e.g.:

awk -F: 'BEGIN {OFS=":"} {$2=$2"!"; print $0}' example

In the case, however, where there are multiple potential field separators but in the case of multiple separators is there a simple way of restoring the original separators?

For example, if example had both : and ; as separators, I could use -F":|;" to process the file but OFS would no be sufficient to restore the original separators in their relative positions.

More explicitly, if we switched to example2 containing

a:e;i:o;u

we could use

awk -F":|;" 'BEGIN {OFS=":"} {$2=$2"!"; print $0}' example2

(or -F"[:;]") to get

a:e!:i:o:u

but we've lost the distinction between : and ; which would have been maintained if we could recover

a:e!;i:o;u

Upvotes: 4

Views: 1007

Answers (1)

Ed Morton
Ed Morton

Reputation: 203189

You need to use GNU awk for the 4th arg to split() which saves the separators, like RT does for RS:

$ awk -F'[:;]' '{split($0,f,FS,s); $2=$2"!"; r=s[0]; for (i=1;i<=NF;i++) r=r $i s[i]; $0=r} 1' file
a:e!;i:o;u

There is no automatically populated array of FS matching strings because of how expensive it'd be in time and memory to store the string that matches FS every time you split a record into fields. Instead the GNU awk folks provided a 4th arg to split() so you can do it yourself if/when you want it. That is the result of a long conversation a few years ago in the comp.lang.awk newsgroup between experienced awk users and gawk providers before all agreeing that this was the best approach.

See split() at https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions.

Upvotes: 5

Related Questions