user130268
user130268

Reputation: 1361

awk how to set record separator as multiple consecutive empty lines or lines only include space and/or tab characters?

I know I can use RS="" to set record separator as multiple consecutive empty lines. However if those lines contain space or tab characters it will not work. I'm thinking to set RF as some kind of regular expression to do the match. But it's hard, since in this case often \n will be used as the field separator FS. Any suggestions?

Upvotes: 1

Views: 1131

Answers (2)

Ed Morton
Ed Morton

Reputation: 204164

With GNU awk for multi-char RS:

awk -v RS='\n(([[:space:]]*\n)+|$)' '{print NR, "<" $0 ">"}' file

e.g.

$ awk '{print NR, "<" $0 ">"}' file
1 <a>
2 <  b>
3 <   >
4 < c>

$ awk -v RS='\n(([[:space:]]*\n)+|$)' '{print NR, "<" $0 ">"}' file
1 <a
  b>
2 < c>

Upvotes: 2

Jotne
Jotne

Reputation: 41460

Here is a way to do it:

awk '!NF {$0=""}1' file | awk -v RS="" '{print NR,$0}'

The first awk counts the fields on the line. This will be 0 if you have blank lines or lines with spaces and tabs only. Then it just change the line to nothing. After this you can use the RS=""


Here is a gnu awk version (due to multiple characters in RS):

awk -v RS="\n([[:space:]]*\n)+" '{print NR,$0}' file

It may work without parentheses, but I am not sure if all will be covered then:

awk -v RS="\n[[:space:]]*\n+" '{print NR,$0}' file

Upvotes: 4

Related Questions