RattleZ
RattleZ

Reputation: 13

How can I print multiple patterns on separate lines

I have a file which I'd like to process with bash. Can be with awk, sed or grep or similar. The file has multiple occurrences on a single line. I would like to extract everything between these two occurrences and print the output each on a separate line.

I have already tried using this:

cat file.txt | grep -o 'pattern1.*pattern2'

But this will print everything matching from pattern1 to the very last matching pattern2.

$ cat file.txt
pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.

I'd like to get:

pattern1 this is the first content pattern2
pattern1 this is the second content pattern2

Upvotes: 0

Views: 527

Answers (3)

user7712945
user7712945

Reputation:

try gnu sed:

 sed -E 's/(pattern2).*(pattern1)(.*\1).*/\1\n\2\3/' file.txt

Upvotes: 0

potong
potong

Reputation: 58468

This might work for you (GNU sed):

sed -n '/pattern1.*pattern2/{s/pattern1/\n&/;s/.*\n//;s/pattern2/&\n/;P;D}' file

Set the option -n to print explicitly.

Only process lines that contain pattern1 followed by pattern2.

Prepend a newline to pattern1.

Remove upto and including the introduced newline.

Append a newline following pattern2.

Print the first line in the pattern space, delete it and repeat.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204054

In case you don't have access to tools that support lookarounds, this approach though lengthy will work robustly using standard tools on any UNIX box:

awk '{
    gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
    out = ""
    while( match($0,/{[^{}]*}/) ) {
        out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    $0 = out
    gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
} 1' file

The above works by creating characters that can't exist in the input (by first changing those characters { and } to some other strings @B and @C) so it can use those chars in a negated character class to find the target strings and then it returns all the changed chars to their original values. Here it is with some prints to make it more obvious what's happening at each step:

awk '{
    print "1): " $0 ORS
    gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C"); gsub(/pattern1/,"{"); gsub(/pattern2/,"}")
    print "2): " $0 ORS
    out = ""
    while( match($0,/{[^{}]*}/) ) {
        out = (out=="" ? "" : out ORS) substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    $0 = out
    print "3): " $0 ORS
    gsub(/}/,"pattern2"); gsub(/{/,"pattern1"); gsub(/}/,"@C"); gsub(/{/,"@B"); gsub(/@A/,"@")
    print "4): " $0 ORS
} 1' file
1): pattern1 this is the first content pattern2 this is some other stuff pattern1 this is the second content pattern2 this is the end of the file.

2): { this is the first content } this is some other stuff { this is the second content } this is the end of the file.

3): { this is the first content }
{ this is the second content }

4): pattern1 this is the first content pattern2
pattern1 this is the second content pattern2

pattern1 this is the first content pattern2
pattern1 this is the second content pattern2

Upvotes: 0

Related Questions