Oli
Oli

Reputation: 15970

How can I search for a multiline pattern in a file?

I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:

find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'

But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.

Upvotes: 175

Views: 171460

Answers (13)

Jonathan L
Jonathan L

Reputation: 10698

As Amit's answer earlier, you can use awk to search for multiple lines. In case you need to print the line number, use the following:

awk '/Start pattern/,/End pattern/ {print NR ":" $0}' filename

Upvotes: 0

kenorb
kenorb

Reputation: 166889

Using ex/vi editor and globstar option (syntax similar to awk and sed):

ex +"/string1/,/string3/p" -R -scq! file.txt

where aaa is your starting point, and bbb is your ending text.

To search recursively, try:

ex +"/aaa/,/bbb/p" -scq! **/*.py

Note: To enable ** syntax, run shopt -s globstar (Bash 4 or zsh).

Upvotes: 2

pbal
pbal

Reputation: 41

perl -ne 'print if (/begin pattern/../end pattern/)' filename

Upvotes: 3

svent
svent

Reputation: 171

You can use the grep alternative sift here (disclaimer: I am the author).

It support multiline matching and limiting the search to specific file types out of the box:

sift -m --files '*.py' 'YOUR_PATTERN'

(search all *.py files for the specified multiline regex pattern)

It is available for all major operating systems. Take a look at the samples page to see how it can be used to to extract multiline values from an XML file.

Upvotes: 4

Martin
Martin

Reputation: 51

@Marcin: awk example non-greedy:

awk '{if ($0 ~ /Start pattern/) {triggered=1;}if (triggered) {print; if ($0 ~ /End pattern/) { exit;}}}' filename

Upvotes: 5

Shwaydogg
Shwaydogg

Reputation: 2509

With silver searcher:

ag 'abc.*(\n|.)*efg'

Speed optimizations of silver searcher could possibly shine here.

Upvotes: 11

Oli
Oli

Reputation: 15970

Here is a more useful example:

pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi "(?s)<title>.*</title>" example.html 

Upvotes: 22

bukzor
bukzor

Reputation: 38532

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P '(?s)<title>.*</title>' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

Upvotes: 26

Oli
Oli

Reputation: 15970

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

the -M option makes it possible to search for patterns that span line boundaries.

For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:

find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...

Upvotes: 109

ayaz
ayaz

Reputation: 10510

Here is the example using GNU grep:

grep -Pzo '_name.*\n.*_description'

-z/--null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

Which has the effect of treating the whole file as one large line. See -z description on grep's manual and also common question no 14 on grep's manual usage page

Upvotes: 129

Amit
Amit

Reputation: 3447

Why don't you go for awk:

awk '/Start pattern/,/End pattern/' filename

Upvotes: 123

TransferOrbit
TransferOrbit

Reputation: 227

I believe the following should work and has the advantage of only using extended regular expressions without the need to install an extra tool like pcregrep if you don’t have it yet or don’t have the -P option to grep available (eg. macOS):

egrep -irzo “.*aaa(.*\s.*){1,}.*bbb.*" path_to_filenames

Caveat emptor: this does some slight disadvantages:

  • it will find the largest selection of lines from the first aaa to the last bbb in each file, unless...
  • there are several repetitions of the aaa [stuff] bbb pattern in each file.

Upvotes: 0

albfan
albfan

Reputation: 12970

This answer might be useful:

Regex (grep) for multi-line search needed

To find recursively you can use flags -R (recursive) and --include (GLOB pattern). See:

Use grep --exclude/--include syntax to not grep through certain files

Upvotes: 5

Related Questions