How to set a multicharacter record separator RS in GNU awk so it encompasses the new lines?

Question

I am using GNU Awk 4.1.3. I want to process this file:

$$$$
1
1
$$$$
2
2
$$$$
3
3
$$$$
1
clave
2
$$$$
5
5
$$$$

And print the block of lines that go between "$$$$" and the next "$$$$" when that given block contains the text "clave" in it. That is, with the given example I want this output:

1
clave
2

My solution is to set the record separator RS to the string "$$$$". Since it is a special character, I need to escape it, so it ends up being like RS='\$\$\$\$':

awk -v RS='\$\$\$\$' '/clave/' file

The problem with this is that the result contains a new line before and after the block:

$ awk -v RS='\$\$\$\$' '/clave/' file

1
clave
2

This is because there is a new line between the end of "$$$$" and "1", and there is also a new line between "2" and the next "$$$$".

To avoid this, I am adding the new line on both ends of the record separator, so it becomes RS=' \$\$\$\$ '. It works well:

$ awk -v RS='
\$\$\$\$
' '/clave/' file
#            ^^^           ^^
1
clave
2

However, this becomes quite complex and I am wondering if including the new line in the record separator may have some side effects that I am not aware of.

For this, I wonder: how can I set the record separator so it encompasses the new lines? Is my approach valid or should I go for other options because my approach has some drawbacks?

Ed Morton · Accepted Answer

You should be matching on the newline before and after the 4 $s as THAT is the real separator (a string of 4 $s on a line of it's own), anything else could fail if 4 $s appeared in your data. The first sting of $s won't have a newline before it of course, it'll match the start-of-string indicator (^) instead, so you need to use:

$ awk -v RS='(^|
)[$]{4}
' '/clave/' file
1
clave
2

I find [$] easier to read than \$, YMMV.

How to set a multicharacter record separator RS in GNU awk so it encompasses the new lines?

Answers (2)

Related Questions