Reputation: 9369
I have a very large text file (40GB gzipped) where blocks of data are separated by //
.
How can I select blocks of data where a certain line matches some criterion? That is, can I grep
a pattern and extend the selection in both directions to the //
delimiter? I can make no assumptions on the size of the block and the position of the line.
not interesting 1
not interesting 2
//
get the whole block 1
MATCH THIS LINE
get the whole block 2
get the whole block 3
//
not interesting 1
not interesting 2
//
I want to select the block of data with MATCH THIS LINE
:
get the whole block 1
MATCH THIS LINE
get the whole block 2
get the whole block 3
I tried sed
but can't get my head around the pattern definition. This for example should match from //
to MATCH THIS LINE
:
sed -n -e '/\/\//,/MATCH THIS LINE/ p' file.txt
But it fails matching the //
.
Is it possible to achieve this with GNU command line tools?
Upvotes: 2
Views: 338
Reputation: 290075
With GNU awk
(due to multi-char RS), you can set the record separator to //
, so that every record is a //
-delimited set of characters:
$ awk -v RS="//" '/MATCH THIS LINE/' file get the whole block 1 MATCH THIS LINE get the whole block 2 get the whole block 3
Note this leaves an empty line above and below because it catches the new line just after // and prints it back, as well as the last one before the // at the end. To remove them you can pipe to awk 'NF'
.
To print the separator between blocks of data you can say (thanks 123):
awk -v RS="//" '/MATCH THIS LINE/{print RT $0 RT}' file
Upvotes: 5