Reputation: 235
I have a common enough problem with a powershell regex to read multi-line records. I've read the threads asking similar questions but can't quite get the solutions to work in my case.
My file consists of multi-line records of variable length. The records I am interested in start with a 01 or a 02 followed by a V or a M. The record ends whenever another record begins or when a batch record starting with '50' is found. The first three characters of each line identifies the record.
ie 01V (Start of record - content follows here) 01
I'm trying to read the individual records with a regex by identifying the start and the end.
What I have at the moment is based off this answer: Match everything between two words in Powershell
#Read the file as a single string
$FilePath = "blaablaablaa"
$TestFile = get-content $FilePath | Out-String
#( ?= Assert that this matches before the current position
# 0(1|2)(V|M) if the record is 01V or 01M or 02V or 02M
# ) End assertion
# .+? Match any number of characters, few as possible
# (?= Until a record starting with 70 is found
# ) End of look ahead
$regex = [regex] '(?is)(?<=0(1|2)(V|M)).+?(?=70)'
echo $TestFile | select-string -Pattern $regex
The above will work with single lines strings if I remove the pipe to out-sting with with the out-string pipe it returns the entire file. I'm guessing I'm not handling the /n characters correctly.
Any advice? The input file looks roughly like this:
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
The required output is splitting out the file into individual records which are delimited by a '50' or a '90' or the start of another record. This for example is the final record :-
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
Upvotes: 2
Views: 2165
Reputation: 4659
Assuming (by your description) you also want to match the part from 01M
untill the next 01M
, and then that one separately until the 50
. This would do the trick:
(?gmis)^0[12][VM](?:[^\n]|\n(?!0[12][VM]|50|90))+
Explanation: after matching 0, 1 or 2, V or M, The part in the (?:...)
is this:
[^\n]|\n(?!0[12][VM]|50|90)
Which means:
match any character that isn't a new-line
OR
a newline that is not followed (?!...)
by either the beginning of a new record or 50 or 90.
Upvotes: 1
Reputation: 68263
Using your test data:
@'
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
'@ | set-content testfile.txt
$Text = Get-Content ./testfile.txt -Raw
$regex = @'
(?ms)(01(?:M|V).+?)
(?:5|9)0.+?
'@
$Records =
[regex]::Matches($Text,$regex) |
foreach {$_.groups[1].value}
$Records[-1]
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01$0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$5xxxxxxxxxxxxxxxxxxxxxxxxxxx
Upvotes: 0