oliv
oliv

Reputation: 13249

AWK: Is there a way to set OFS as FS if this one is a regex?

In awk, the field (or record) separator FS (or RS) can be set as a regular expression. It works great for getting any individual field, but once you set one these fields, the field seperators are "gone".

echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {$3="z"}1'
a b z d 

In this case the output field separator OFS is per default set as a space.

Unfortunately this kind of statement OFS=FS="[|-]" is not working, because it sets OFS as a litteral string.

I understand that it might get tricky for awk to select the output field separator if there are several choices, but in case of no new fields, the current ones could remain.

So, is there an easy way to set OFS to be the exact same regex as FS, such that I get this?

echo "a|b-c|d" | awk '... {$3="z"}1'
a|b-z|d

Alternatively, is there a way to capture all separators, in a array for example?

The same question also applies to the record separator RS (and its associated ORS)

Upvotes: 6

Views: 1619

Answers (3)

fedorqui
fedorqui

Reputation: 289725

As you already mentioned, there is no way to set OFS dynamically based on the FS that was used on every case. If the regex was in RS instead of FS, you could use RT (in fact, I just see anubhava's answer does this, nice!).

However, there is another way if you have GNU awk: as seen in column replacement with awk, with retaining the format (Ed Morton's answer), you can use split() and, specially, its 4th argument. Why? Because it stores the separator between every slice:

gawk 'BEGIN{FS="[|-]"}                     # set FS
     {split($0, a, FS, seps)               # split based on FS and ...
                                           # ...  store pieces in the array seps()
      a[3]="z"                             # change the 3rd field
      for (i=1;i<=NF;i++)                  # print the data back
           printf "%s%s", a[i], seps[i]    # keeping the separators
      print ""                             # print a new line
     }'

As one-liner:

$ gawk 'BEGIN{FS="[|-]"} {split($0, a, FS, seps); a[3]="z"; for (i=1;i<=NF;i++) printf "%s%s", a[i], seps[i]; print ""}' <<< "a|b-c|d"
a|b-z|d

split(string, array [, fieldsep [, seps ] ])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array1, the second piece in array2, and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).

Upvotes: 5

anubhava
anubhava

Reputation: 785156

awk rewrites each record using OFS if you change any filed value using $N=<whatever> (where N is field number).

Since you're using multiple delimiters in FS you cannot use OFS=FS.

If you have gnu awk then you can use RS and RT based solution:

s="a|b-c|d"
awk -v RS='[-|]' 'NR==3{$0="z"} {printf "%s%s", $0, RT}' <<< "$s"

a|b-z|d

Alternatively you can use sed:

s="a|b-c|d"
sed -E 's/^(([^|-]+[|-]){2})[^|-]+/\1z/' <<< "$s"

a|b-z|d

Upvotes: 3

James Brown
James Brown

Reputation: 37404

Since you clearly don't need to work the fields, just process $0 other ways, like below with sub:

$ echo "a|b-c|d" | awk 'BEGIN{FS="[|-]"} {sub(/c/,"z")}1'
a|b-z|d

Upvotes: 0

Related Questions