RayBacker
RayBacker

Reputation: 176

How to use FINDSTR in PowerShell to find lines where all words in the search string match in any order

The following findstr.exe command almost does what I want, but not quite:

findstr /s /i /c:"word1 word2 word3" *.abc

I have used:

The above looks for word1 word2 word3 as a literal, and therefore only finds the words in that exact order.

By contrast, I want all words to match individually, in any order (AND logic, conjunction).

If I remove /c: from the command above, then lines matching any of the words are returned (OR logic, disjunction), which is not what I want.

Can this be done in PowerShell?

Upvotes: 6

Views: 32829

Answers (4)

mklement0
mklement0

Reputation: 437743

Note:

  • The first part of this answer does not solve the OP's problem - for solutions, see Mathias R. Jessen's helpful answer and Ansgar Wiecher's helpful answer; alternatively, see the bottom of this answer, which offers a generic solution adapted from Mathias' code.

    • (Due to an initial misreading of the question), the next section of this answer uses disjunctive logic - matching lines that have at least one matching search term - which is the only logic that findstr.exe and PowerShell's Select-String (directly) support.

      • This part of the answer may still be of interest with respect to translating findstr.exe commands to PowerShell, using Select-String.
    • By contrast, the OP is asking for conjunctive logic, where all search terms must match, which requires additional work; the bottom section shows a solution based on Mathias' answer.


A disjunctive solution: The PowerShell equivalent of the findstr command from the question, but without /c: -
FINDSTR /s /i "word1 word2 word3" *.abc

  • is:

    (
      Get-ChildItem -File -Filter *.abc -Recurse |
      Select-String -SimpleMatch -Pattern 'word1', 'word2', 'word3'
    ).Count
    
  • /s -> Get-ChildItem -File -Filter *.abc -Recurse outputs all files in the current directory subtree matching *.abc

    • Note that wile Select-String is capable of accepting a filename pattern (wildcard expression) such as *.abc, it doesn't support recursion, so the separate Get-ChildItem call is needed, whose output is piped to Select-String.
  • findstr -> Select-String, PowerShell's more flexible counterpart:

    • -SimpleMatch specifies that the -Pattern argument(s) be interpreted as literals rather than as regexes (regular expressions). Note how they defaults differ:

      • findstr expects literals by default (you can switch to regexes with /R).
      • Select-String expects regexes by default (you can switch to literal with -SimpleMatch).
    • -i -> (default behavior); like most of PowerShell, case-insensitivity is Select-String's default behavior - add -CaseSensitive to change that.

    • "word1 word2 word3" -> -Pattern 'word1', 'word2', 'word3'; specifying an array of patterns looks for a match for at least one of the patterns on each line (disjunctive logic).

      • That is, all of the following lines would match: ... word1 ..., ... word2 ..., ... word2 word1 ..., ... word3 word1 word2 ...
  • /c -> (...).Count: Select-String outputs a collection of objects representing the matching lines, which this expression simply counts. The objects output are [Microsoft.PowerShell.Commands.MatchInfo] instances, which not only include the matching line, but metadata about the input and the specifics of what matched.


A conjunctive solution, building on Mathias R. Jessen's elegant wrapper function:

Select-StringAll (an MIT-licensed Gist) is a conjunctive-only wrapper function around the disjunctive-only Select-String cmdlet that uses the exact same syntax as the latter, with the exception of not supporting the -AllMatches switch.

That is, Select-StringAll requires that all patterns passed to it - whether they're regexes (by default) or literals (with -SimpleMatch) - match a line.

Applied to the OP's problem, we get:

(
  Get-ChildItem -File -Filter *.abc -Recurse |
  Select-StringAll -SimpleMatch word1, word2, word3
).Count

Note the variations compared to the command at the top:

  • The -Pattern parameter is implicitly bound, by argument position.
  • The patterns are specified as barewords (unquoted) for convenience, though it's generally safer to quote, because it's not easy to remember what arguments need quoting.

Upvotes: 3

Sorizon
Sorizon

Reputation: 43

The following will work if you DO NOT HAVE ANY OF THE WORDS REPEATED IN THE SAME LINE as: word1 hello word1 bye word1

findstr /i /r /c:"word[1-3].*word[1-3].*word[1-3]" *.abc

If repeated word1/word2/word3 is not there, or you do want those occurrences in your result, then can use it.

Upvotes: 0

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200283

Another (admittedly less sophisticated) approach would be to simply daisy-chain filters, since the order of the words doesn't matter. Filter your files for one word first, then filter the output for lines that also contain the second word, then filter that output for lines that also containt the third word.

findstr /s /i "word1" *.abc | findstr /i "word2" | findstr /i "word3"

Using PowerShell cmdlets the above would look like this:

Get-ChildItem -Filter '*.abc' -Recurse | Get-Content | Where-Object {
  $_ -like '*word1*' -and
  $_ -like '*word2*' -and
  $_ -like '*word3*'
}

or (using aliases):

ls '*.abc' -r | cat | ? {
  $_ -like '*word1*' -and
  $_ -like '*word2*' -and
  $_ -like '*word3*'
}

Note that aliases are just to save time typing on the commandline, so I do not recommend using them in scripts.

Upvotes: 6

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174485

You can use Select-String to do a regex based search through multiple files.

To match all of multiple search terms in a single string with regular expressions, you'll have to use a lookaround assertion:

Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$'

In the above example, this is what's happening with the first command:

Get-ChildItem -Filter *.abc -Recurse

Get-ChildItem searches for files in the current directory
-Filter *.abc shows us only files ending in *.abc
-Recurse searches all subfolders

We then pipe the resulting FileInfo objects to Select-String and use the following regex pattern:

^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$
^              # start of string  
 (?=           # open positive lookahead assertion containing
    .*         # any number of any characters (like * in wildcard matching)
      \b       # word boundary
        word1  # the literal string "word1"
      \b       # word boundary
 )             # close positive lookahead assertion
 ...           # repeat for remaining words
 .*            # any number of any characters
$              # end of string

Since each lookahead group is just being asserted for correctness and the search position within the string never changes, the order doesn't matter.


If you want it to match strings that contain any of the words, you can use a simple non-capturing group:

Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '\b(?:word1|word2|word3)\b'
\b(?:word1|word2|word3)\b
\b          # start of string  
  (?:       # open non-capturing group
     word1  # the literal string "word1"
     |      # or
     word2  # the literal string "word2"
     |      # or
     word3  # the literal string "word3"
  )         # close positive lookahead assertion
\b          # end of string

These can of course be abstracted away in a simple proxy function.

I generated the param block and most of the body of the Select-Match function definition below with:

$slsmeta = [System.Management.Automation.CommandMetadata]::new((Get-Command Select-String))
[System.Management.Automation.ProxyCommand]::Create($slsmeta)

Then removed unnecessary parameters (including -AllMatches and -Pattern), then added the pattern generator (see inline comments):

function Select-Match
{
    [CmdletBinding(DefaultParameterSetName='Any', HelpUri='http://go.microsoft.com/fwlink/?LinkID=113388')]
    param(
        [Parameter(Mandatory=$true, Position=0)]
        [string[]]
        ${Substring},

        [Parameter(Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
        [Alias('PSPath')]
        [string[]]
        ${LiteralPath},

        [Parameter(ParameterSetName='Any')]
        [switch]
        ${Any},

        [Parameter(ParameterSetName='Any')]
        [switch]
        ${All},

        [switch]
        ${CaseSensitive},

        [switch]
        ${NotMatch},

        [ValidateNotNullOrEmpty()]
        [ValidateSet('unicode','utf7','utf8','utf32','ascii','bigendianunicode','default','oem')]
        [string]
        ${Encoding},

        [ValidateNotNullOrEmpty()]
        [ValidateCount(1, 2)]
        [ValidateRange(0, 2147483647)]
        [int[]]
        ${Context}
    )

    begin
    {
        try {
            $outBuffer = $null
            if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
            {
                $PSBoundParameters['OutBuffer'] = 1
            }

            # Escape literal input strings
            $EscapedStrings = foreach($term in $PSBoundParameters['Substring']){
                [regex]::Escape($term)
            }

            # Construct pattern based on whether -Any or -All was specified 
            if($PSCmdlet.ParameterSetName -eq 'Any'){
                $Pattern = '\b(?:{0})\b' -f ($EscapedStrings -join '|')
            } else {
                $Clauses = foreach($EscapedString in $EscapedStrings){
                    '(?=.*\b{0}\b)' -f $_
                }
                $Pattern = '^{0}.*$' -f ($Clauses -join '')
            }

            # Remove the Substring parameter argument from PSBoundParameters
            $PSBoundParameters.Remove('Substring') |Out-Null

            # Add the Pattern parameter argument
            $PSBoundParameters['Pattern'] = $Pattern

            $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Microsoft.PowerShell.Utility\Select-String', [System.Management.Automation.CommandTypes]::Cmdlet)
            $scriptCmd = {& $wrappedCmd @PSBoundParameters }
            $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
            $steppablePipeline.Begin($PSCmdlet)
        } catch {
            throw
        }
    }

    process
    {
        try {
            $steppablePipeline.Process($_)
        } catch {
            throw
        }
    }

    end
    {
        try {
            $steppablePipeline.End()
        } catch {
            throw
        }
    }
    <#

    .ForwardHelpTargetName Microsoft.PowerShell.Utility\Select-String
    .ForwardHelpCategory Cmdlet

    #>

}

Now you can use it like this, and it'll behave almost like Select-String:

Get-ChildItem -Filter *.abc -Recurse |Select-Match word1,word2,word3 -All

Upvotes: 8

Related Questions