freza19
freza19

Reputation: 29

How to use awk with multivalue delimiter

How I can use is awk delimiter which contains multivalue: "#@$"

I have file like this: Test1#@$Test2#@$Test3#@$Test4 I need to extract 'Test2'. After I execute this command: awk -F "#@$" '{print $2}', nothing is displayed>

And after that awk -F "#@$" '{print $1}' i get full line

Any ideas?

Upvotes: 0

Views: 273

Answers (2)

kvantour
kvantour

Reputation: 26471

The issue you are having is that the field separator FS is considered to be a regular expression. The <dollar>-character ($) has a special meaning in regular expressions as it represents an anchor for the end-of-the-line. The solution is to escape it twice as the <backslash>-escapes are interpreted twice; once in lexical processing of the string and once in processing the regular expression:

awk -F '#@\\$' '{print $1}'

An extended regular expression can be used to separate fields by assigning a string containing the expression to the built-in variable FS, either directly or as a consequence of using the -F sepstring option. The default value of the FS variable shall be a single <space>. The following describes FS behaviour:

  1. If FS is a null string, the behaviour is unspecified.
  2. If FS is a single character:

    • If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.
    • Otherwise, if FS is any other character c, fields shall be delimited by every single occurrence of c.
  3. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.

source: POSIX awk standard


A <dollar-sign> ($) outside a bracket expression shall anchor the expression or subexpression it ends to the end of a string; such an expression or subexpression can match only a sequence ending at the last character of a string. For example, the EREs ef$ and (ef$) match ef in the string abcdef, but fail to match in the string cdefab, and the ERE e$f is valid, but can never match because the f prevents the expression e$ from matching ending at the last character.

source: POSIX Extended Regular Expressions

Upvotes: 1

stack0114106
stack0114106

Reputation: 8711

Just wrap $ in brackets [] to remove its special significance

> cat t1
Test1#@$Test2#@$Test3#@$Test4
> awk -F '#@[$]' '{print $2}' t1
Test2
> 

Upvotes: 1

Related Questions