user3046742
user3046742

Reputation: 235

Powershell Regex: Reading a multi-line string between two points

I have a common enough problem with a powershell regex to read multi-line records. I've read the threads asking similar questions but can't quite get the solutions to work in my case.

My file consists of multi-line records of variable length. The records I am interested in start with a 01 or a 02 followed by a V or a M. The record ends whenever another record begins or when a batch record starting with '50' is found. The first three characters of each line identifies the record.

ie 01V (Start of record - content follows here) 01

I'm trying to read the individual records with a regex by identifying the start and the end.

What I have at the moment is based off this answer: Match everything between two words in Powershell

#Read the file as a single string
$FilePath = "blaablaablaa"
$TestFile = get-content $FilePath | Out-String 

#( ?= Assert that this matches before the current position
# 0(1|2)(V|M) if the record is 01V or 01M or 02V or 02M 
# ) End assertion 
# .+? Match any number of characters, few as possible
# (?= Until a record starting with 70 is found  
# ) End of look ahead
$regex = [regex] '(?is)(?<=0(1|2)(V|M)).+?(?=70)'
echo $TestFile |  select-string -Pattern $regex 

The above will work with single lines strings if I remove the pipe to out-sting with with the out-string pipe it returns the entire file. I'm guessing I'm not handling the /n characters correctly.

Any advice? The input file looks roughly like this:

00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal

The required output is splitting out the file into individual records which are delimited by a '50' or a '90' or the start of another record. This for example is the final record :-

01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx

Upvotes: 2

Views: 2165

Answers (2)

asontu
asontu

Reputation: 4659

Assuming (by your description) you also want to match the part from 01M untill the next 01M, and then that one separately until the 50. This would do the trick:

(?gmis)^0[12][VM](?:[^\n]|\n(?!0[12][VM]|50|90))+

Explanation: after matching 0, 1 or 2, V or M, The part in the (?:...) is this:

[^\n]|\n(?!0[12][VM]|50|90)

Which means:

match any character that isn't a new-line

OR

a newline that is not followed (?!...) by either the beginning of a new record or 50 or 90.

online Regex101 demo

Upvotes: 1

mjolinor
mjolinor

Reputation: 68263

Using your test data:

@'
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
'@ | set-content testfile.txt


$Text = Get-Content ./testfile.txt -Raw

$regex = @'
(?ms)(01(?:M|V).+?)
(?:5|9)0.+?
'@


$Records = 
[regex]::Matches($Text,$regex) |
foreach {$_.groups[1].value}

$Records[-1]

01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx

Upvotes: 0

Related Questions