Reputation: 13
I have a file which I'd like to process with bash. Can be with awk, sed or grep or similar. The file has multiple occurrences on a single line. I would like to extract everything between these two occurrences and print the output each on a separate line.
I have already tried using this:
cat file.txt | grep -o 'pattern1.*pattern2'
But this will print everything matching from pattern1 to the very last matching pattern2.
$ cat file.txt
pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.
I'd like to get:
pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
Upvotes: 0
Views: 527
Reputation:
try gnu sed:
sed -E 's/(pattern2).*(pattern1)(.*\1).*/\1\n\2\3/' file.txt
Upvotes: 0
Reputation: 58468
This might work for you (GNU sed):
sed -n '/pattern1.*pattern2/{s/pattern1/\n&/;s/.*\n//;s/pattern2/&\n/;P;D}' file
Set the option -n
to print explicitly.
Only process lines that contain pattern1
followed by pattern2
.
Prepend a newline to pattern1
.
Remove upto and including the introduced newline.
Append a newline following pattern2
.
Print the first line in the pattern space, delete it and repeat.
Upvotes: 0
Reputation: 204054
In case you don't have access to tools that support lookarounds, this approach though lengthy will work robustly using standard tools on any UNIX box:
awk '{
gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
out = ""
while( match($0,/{[^{}]*}/) ) {
out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
$0 = out
gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
} 1' file
The above works by creating characters that can't exist in the input (by first changing those characters {
and }
to some other strings @B
and @C
) so it can use those chars in a negated character class to find the target strings and then it returns all the changed chars to their original values. Here it is with some prints to make it more obvious what's happening at each step:
awk '{
print "1): " $0 ORS
gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
print "2): " $0 ORS
out = ""
while( match($0,/{[^{}]*}/) ) {
out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
$0 = out
print "3): " $0 ORS
gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
print "4): " $0 ORS
} 1' file
1): pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.
2): { this is the first content } this is some other stuff { this is the second content } this is the end of the file.
3): { this is the first content }
{ this is the second content }
4): pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
pattern1 this is the first content pattern2
pattern1 this is the second content pattern2
Upvotes: 0