user68288
user68288

Reputation: 774

Using Select-Object in Powershell, how can I select only the part of a string I want on a per line basis?

Currently I have a script that will search a directory and fine all instances of the word "dummy". It will then output to a CSV the FileName, Path, LineNumber, Line to a file.

This Line contains a very standardized results like:

I am trying to find a way to output an additional column in my CSV that contains all characters before the "?" as well as all of the characters after "dummy,".

Resulting lines would be:

I tried to use split but it keeps removing additional characters. Is it possible to find the index of "dummy," and "?" and then substring out the middle portion?

Any help would be greatly appreciated.

Code as it stands:

Write-Host "Hello, World!"

# path
$path = 'C:\Users\Documents\4_Testing\fe\*.ts'
# pattern to find dummy
$pattern = "dummy,"

Get-ChildItem -Recurse  -Path $path | Select-String -Pattern $pattern |
Select-Object FileName,Path,LineNumber,Line
,@{name='Function';expression={
    $_.Line.Split("dummy,")
}} |
Export-Csv 'C:\Users\User\Documents\4_Testing\Output1.csv' -NoTypeInformation

Write-Host "Complete"

Upvotes: 1

Views: 945

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174485

Use the -replace operator to replace the whole line with just the part between dummy, and ?:

PS ~> 'Hi I am a dummy, who are you?' -replace '^.*dummy,\s*(.*)\?\s*$', '$1'
who are you

So your calculated property definition should like this:

@{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*)\?\s*$', '$1' }}

The pattern used above describes:

^         # start of string
 .*       # 0 or more of any character
 dummy,   # the literal substring `dummy,`
 \s*      # 0 or more whitespace characters
 (        # start of capture group
  .*      # 0 or more of any character
 )        # end capture group
 \?       # a literal question mark
 \s*      # 0 or more whitespace characters
$         # end of line/string

If you also want to remove everything after the first ?, change the pattern slightly:

@{Name = 'Function'; Expression = { $_.Line -replace '^.*dummy,\s*(.*?)\?.*$', '$1' }}

Adding the metacharacter ? to .* makes the subexpression lazy, meaning the regex engine tries to match as few characters as possible - meaning we'll only capture up until the first ?.

Upvotes: 3

Related Questions