fedorqui
fedorqui

Reputation: 290105

How to set a multicharacter record separator RS in GNU awk so it encompasses the new lines?

I am using GNU Awk 4.1.3. I want to process this file:

$$$$
1
1
$$$$
2
2
$$$$
3
3
$$$$
1
clave
2
$$$$
5
5
$$$$

And print the block of lines that go between "$$$$" and the next "$$$$" when that given block contains the text "clave" in it. That is, with the given example I want this output:

1
clave
2

My solution is to set the record separator RS to the string "$$$$". Since it is a special character, I need to escape it, so it ends up being like RS='\\$\\$\\$\\$':

awk -v RS='\\$\\$\\$\\$' '/clave/' file

The problem with this is that the result contains a new line before and after the block:

$ awk -v RS='\\$\\$\\$\\$' '/clave/' file

1
clave
2

This is because there is a new line between the end of "$$$$" and "1", and there is also a new line between "2" and the next "$$$$".

To avoid this, I am adding the new line on both ends of the record separator, so it becomes RS='\n\$\$\$\$\n'. It works well:

$ awk -v RS='\n\\$\\$\\$\\$\n' '/clave/' file
#            ^^^           ^^
1
clave
2

However, this becomes quite complex and I am wondering if including the new line in the record separator may have some side effects that I am not aware of.

For this, I wonder: how can I set the record separator so it encompasses the new lines? Is my approach valid or should I go for other options because my approach has some drawbacks?

Upvotes: 4

Views: 281

Answers (2)

anubhava
anubhava

Reputation: 785631

You are getting a newline before and after because there is a new line before and after $$$$ in your file and by setting RS to $$$$ you are leaving those line breaks in record.

Change your RS to include a newline or start before and a newline or end afterwards, so that a record will be without those line breaks:

awk -v RS='(^|\n)\\${4}(\n|$)' '/clave/' fike

1
clave
2

Also note that you can use fix length quantifier \\${4} instead of \\$\\$\\$\\$.

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 204164

You should be matching on the newline before and after the 4 $s as THAT is the real separator (a string of 4 $s on a line of it's own), anything else could fail if 4 $s appeared in your data. The first sting of $s won't have a newline before it of course, it'll match the start-of-string indicator (^) instead, so you need to use:

$ awk -v RS='(^|\n)[$]{4}\n' '/clave/' file
1
clave
2

I find [$] easier to read than \\$, YMMV.

Upvotes: 3

Related Questions