Reputation: 338
I'm searching for a fast method to find all files in a folder which contain 2 or more patterns
grep -l -e foo -e bar ./*
or
rg -l -e foo -e bar
show all files containing 'foo' AND 'bar' in the same line or 'foo' OR 'bar' in different lines but I want only files that have at a minimum one 'foo' match AND one 'bar' match in different lines. Files which only have 'foo' matches or only 'bar' matches shall be filtered out.
I know I could chain the grep calls but this will be too slow.
Upvotes: 22
Views: 14224
Reputation: 1076
Expanding on @Chad Baldwin's answer. On Mac you'll soon reach the shell argument limit. Use xargs to resolve this:
$ rg -l "my first match" | xargs rg "my second match"
If you want to find N matches:
$ rg -l "my first match" | xargs rg -l "my second match" | ... | xargs rg "my final match"
Upvotes: 0
Reputation: 1
rg 'text1|text2'
This way files containing both text1 and text2 will be found.
Upvotes: 0
Reputation: 86
you can add the following function: (tested in zsh)
multisearch() {
case $# in
0) return 1 ;;
1) rg $1 ;;
esac
local lastArg=${@[${#}]}
local files=(`rg --files-with-matches ${1}`)
(( ${#files} )) || return 0
# skip first and last arg
for arg in ${@:2:# - 2}; do
files=(`rg --files-with-matches ${arg} ${files[@]}`)
(( ${#files} )) || return 0
done
rg ${lastArg} ${files[@]}
}
and use like:
$ multisearch foo bar
Upvotes: 2
Reputation: 1373
rg
with multiline
does work, however it will print as result everything in-between the criteria and sometimes that's not useful.
For the use case of chaining searches (in e.g. html, json
, etc), where the 1st criterium is just to narrow down the files, and the 2nd criterium is actually what I am looking for, this is a possible solution:
rg -0 -l crit1 | xargs -0 -I % rg -H crit2 %
Alternatively I have just discovered ugrep
which supports combining multiple criteria using boolean operators both on line and file level. This is quite something. It's a bit slower than rg + xargs
, however it prints nicely all lines matching all criteria from the files (instead of just showing the last criteria from above):
ugrep --files -e crit1 --and -e crit2
Upvotes: 10
Reputation: 2602
So this doesn't perfectly answer the question, but, this is the StackOverflow question that pops up every time I google "ripgrep multiple patterns". So I'm leaving my answer here for the future googler (including myself)...
I primarily work in PowerShell, so this is how I perform an and
search in ripgrep in PowerShell. This will match same line matches, which is why it's not a perfect answer, but it will identify files that match both patterns, and runs relatively quickly:
rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern')
Explanation:
First the parens run: rg -l 'FirstSearchPattern'
, which searches all files for the pattern FirstSearchPattern
. By using -l
it returns a list of file paths only.
By placing it in (
parentheses)
, it runs the whole command first, then "splats" the results of the command into the external rg
command.
The external rg
command is now run like this:
rg -l 'SecondSearchPattern' "file.txt" "directory\file.txt"
And yes, it does put them into quotes, so it handles paths with spaces. This searches all provided files that match the pattern SecondSearchPattern
. Thus returning only files that match both patterns.
You can go one step further and add on | Get-Item
(| gi
) to return filesystem objects, and | % FullName
to get the full path.
rg -l 'SecondSearchPattern' (rg -l 'FirstSearchPattern') | gi | % FullName
Upvotes: 19
Reputation: 76634
If you want to search for two or more words that occur on multiple lines you can use ripgrep
's option --multiline-dotall
, in addition to to provide -U
/--multiline
. You also need to search for foo
before bar
and bar
before foo
using the |
operator:
rg -lU --multiline-dotall 'foo.*bar|bar.*foo' .
For any number of words you'll need to |
all permutations of those words. For that I use a small python script (which I called rga
) which searches in
the current directory (and downwards), for files that contain all arguments given on the commandline:
#! /opt/util/py310/bin/python
import sys
import subprocess
from itertools import permutations
rgarg = '|'.join(('.*'.join(x) for x in permutations(sys.argv[1:])))
cmd = ['rg', '-lU', '--multiline-dotall', rgarg, '.']
# print(' '.join(cmd))
proc = subprocess.run(cmd, capture_output=True)
sys.stdout.write(proc.stdout.decode('utf-8'))
I have searched successfully with six arguments, above that the commandline becomes to long. There are probably ways around that by saving the argument to a file and adding -f file_name
, but I never needed/investigated that.
Upvotes: 6
Reputation: 23667
$ cat f1
afoot
2bar
$ cat f2
foo bar
$ cat f3
foot
$ cat f4
bar
$ cat f5
barred
123
foo3
$ rg -Ul '(?s)foo.*?\n.*?bar|bar.*?\n.*?foo'
f5
f1
You can use -U
option to match across lines. The s
flag will enable .
to match newlines as well. Since you want the matches to be across different lines, you need to match a newline character in between the search terms as well.
Upvotes: 4