Martin Preusse
Martin Preusse

Reputation: 9369

Select full block of text delimited by some chars

I have a very large text file (40GB gzipped) where blocks of data are separated by //.

How can I select blocks of data where a certain line matches some criterion? That is, can I grep a pattern and extend the selection in both directions to the // delimiter? I can make no assumptions on the size of the block and the position of the line.

not interesting 1
not interesting 2
//
get the whole block 1
MATCH THIS LINE
get the whole block 2
get the whole block 3
//
not interesting 1
not interesting 2
//

I want to select the block of data with MATCH THIS LINE:

get the whole block 1
MATCH THIS LINE
get the whole block 2
get the whole block 3

I tried sed but can't get my head around the pattern definition. This for example should match from // to MATCH THIS LINE:

sed -n -e '/\/\//,/MATCH THIS LINE/ p' file.txt

But it fails matching the //.

Is it possible to achieve this with GNU command line tools?

Upvotes: 2

Views: 338

Answers (1)

fedorqui
fedorqui

Reputation: 290075

With GNU awk (due to multi-char RS), you can set the record separator to //, so that every record is a //-delimited set of characters:

$ awk -v RS="//" '/MATCH THIS LINE/' file

get the whole block 1
MATCH THIS LINE
get the whole block 2
get the whole block 3

Note this leaves an empty line above and below because it catches the new line just after // and prints it back, as well as the last one before the // at the end. To remove them you can pipe to awk 'NF'.

To print the separator between blocks of data you can say (thanks 123):

awk -v RS="//" '/MATCH THIS LINE/{print RT $0 RT}' file

Upvotes: 5

Related Questions