PowerShell Regex Ignore up until character string match including string match

Question

I am trying to read a file and ignore everything up until a character match. Sometimes the character match will appear on the same line with the results I need, so I can't do a Select-Object -Skip x where x is the number of lines returned from a document.

I have tried to use the .Split('

') method on the results, and that worked, but I can't select the index because it's a multi-line string that returned.

Below is the start of an example of text returning. It's a HTML response that I'm trying to read the data out of. I cannot use the Content as it's in ByteArray and has a space between every character. So I've concluded it's time to ask for help with [Regex] in PowerShell to assist.

I was looking at this answer and thought I could use /.+?(?=abc)/ by means of replacing the search string like this:

(Get-Content $env:TEMP	est.txt) | ForEach-Object { 
    [Regex]::Match($_, "^.+(?=\)").Value
}

That didn't work either. I'm OK with regex when looking for match like {\d\d\d} to ensure it's 3 digits long, but I'm not sure how to use it in this instance.

This is the start of a file being returned. I need to ignore everything up to and including the characters

and then anything after that to the end of the file is OK.

Example command and result being returned here:

PS> Get-Content $env:TEMP est.txt HTTP/1.1 200 OK Content-Length: 3524 Date: Thu, 18 Jun 2020 15:00:05 GMT Last-Modified: Fri, 19 Jun 2020 01:00:05 GMT Server: TTWS/1.2 on Microsoft-HTTPAPI/2.0 Test TCP WebServer 1.2


    Directory: C:	mp

EDIT:

I have this now, which removes everything up to and including the first

 tag and also removes the closing

tag, but won't remove anything AFTER the closing

tag.

(Get-Content $env:TEMP	est.txt -Raw) -replace '(?s)^.*?' -replace '(.+?)'

Can that be expanded to include to the end of the file?

Wiktor Stribiżew · Accepted Answer

The .+? pattern is "lazy", non-greedy. It means it will match the least amount of characters that it is allowed to match. Since you have .+? at the end of the pattern, and .+? matches 1 or more characters, it will match one character and quit. You need a greedy quantifier, * or +.

Besides, you can achieve what you need with a single -replace command if you use a capturing group.

You need to use

(Get-Content $env:TEMP	est.txt -Raw) -replace '(?s)^.*?(.*?).*', '$1'

It will take the whole file content and get the text contents between the first

 string and the closest

.

Pattern details

(?s) - a RegexOptions.Singleline inline modifier making . match newlines, too
^ - start of string
.*? - any zero or more chars as few as possible
```
  - a  text
```
(.*?) - capturing group #1: any zero or more chars as few as possible
- a text
.* - any zero or more chars as many as possible (as * is a greedy quantifier).

The $1 in the replacement pattern will restore Group 1 value in the result (so, it will remain).

PowerShell Regex Ignore up until character string match including string match

EDIT:

Answers (1)

Related Questions