DevHawk
DevHawk

Reputation: 107

Extract part of text in PowerShell

This is my input file which is random, can be any number not just 9999 and any letters: The below format will always come after a - (dash).

-
9999 99AKDSLY9ZWSRK99999
9999 99BGRPOE99FTRQ99999

Expected output:

AKDSLY9ZSRK
BGRPOE99TRQ

So I need to remove the first part of each line, always numbers:

9999 99
9999 99

Then remove the not-required characters:

99AKDSLY9ZW → in this case is the W but could be any letter
99BGRPOE99F → in this case is the F but could be any letter

And finally remove the last 5 digits, always numbers:

99999
99999

What I´m trying to use, regex (first time using it):

$result = [regex]::Matches($InputFile, '(^\d{4}\s\d{2}[A-Z0-9]\d{5}$)') -replace '\d{4}\s\d{2}', '')
$result

It's not giving me an error message but it's not showing me the characters I'm expecting to see at $result.

I was expecting to see something in $result to then start the formatting, deleting the characters I don't need.

What could be missing here, please?

Upvotes: 0

Views: 640

Answers (1)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200493

Try something like this:

$str = (Get-Content ... -Raw) -replace '\r'

$cb = {
  $args[0].Groups[1].Value -replace '(?m)^.{7}' -replace '(?m).(.{3}).{5}$', '$1'
}

$re = [regex]'(?m)^(?<=-\n)((?:\d{4}\s\d{2}[^\n]*\d{5}(?:\n|$))+)'

$re.Replace($str, $cb)

The regular expression $re matches multiline substrings that start with a hyphen and a newline, followed by one or more line with your digit/letter combinations. The (?<=...) is a positive lookbehind assertion to ensure that you only get a match when the lines with the digit/letter combinations are preceded by a line with a hyphen (without making that line part of the actual match).

The scriptblock $cb is an anonymous callback function that the Regex.Replace() method calls on each match. For each line in a match it removes the first 7 characters from the beginning of the line, and replaces the last 9 characters from the end of the line with the 2nd through 4th of those characters.

For simplicity reasons the sample code removes carriage return characters (CR, \r) from the string, so that all newlines are linefeed characters (LF, \n) instead of the default CR-LF.

Upvotes: 1

Related Questions