chandra
chandra

Reputation: 19

Field spearator to used if they are not escaped using awk

i have once question, suppose i am using "=" as fiels seperator, in this case if my string contain for example

abc=def\=jkl 

so if i use = as fields seperator, it will split into 3 as

abc def\ jkl 

but as i have escaped 2nd "=" , my output should be as

abc def\=jkl

Can anyone please provide me any suggestion , if i can achieve this. Thanks in advance

Upvotes: 1

Views: 89

Answers (2)

Ed Morton
Ed Morton

Reputation: 204099

I find it simplest to just convert the offending string to some other string or character that doesn't appear in your input records (I tend to use RS if it's not a regexp* since that cannot appear within a record, or the awk builtin SUBSEP otherwise since if that appears in your input you have other problems) and then process as normal other than converting back within each field when necessary, e.g.:

$ cat file
abc=def\=jkl

$ awk -F= '{
   gsub(/\\=/,RS)
   for (i=1; i<=NF; i++) {
      gsub(RS,"\\=",$i)
      print i":"$i
   }
}' file
1:abc
2:def\=jkl

* The issue with using RS if it is an RE (i.e. multiple characters) is that the gsub(RS...) within the loop could match a string that didn't get resolved to a record separator initially, e.g.

$ echo "aa" | gawk -v RS='a$' '{gsub(RS,"foo",$1); print "$1=<"$1">"}'
$1=<afoo>

When the RS is a single character, e.g. the default newline, that cannot happen so it's safe to use.

Upvotes: 3

Kent
Kent

Reputation: 195209

If it is like the example in your question, it could be done.

awk doesn't support look-around regex. So it would be a bit difficult to get what you want by setting FS.

If I were you, I would do some preprocessing, to make the data easier to be handled by awk. Or you could read the line, and using other functions by awk, e.g. gensub() to remove those = s you don't want to have in result, and split... But I guess you want to achieve the goal by playing field separator, so I just don't give those solutions.

However it could be done by FPAT variable.

awk -vFPAT='\\w*(\\\\=)?\\w*' '...' file

this will work for your example. I am not sure if it will work for your real data.

let's make an example, to split this string: "abc=def\=jkl=foo\=bar=baz"

kent$  echo "abc=def\=jkl=foo\=bar=baz"|awk -vFPAT='\\w*(\\\\=)?\\w*' '{for(i=1;i<=NF;i++)print $i}'
abc
def\=jkl
foo\=bar
baz

I think you want that result, don't you?

my awk version:

kent$  awk --version|head -1
GNU Awk 4.0.2

Upvotes: 1

Related Questions