Reputation: 107
Input File content is at the bottom. The image shows better the file format.
As you can see from my input file it comes with lots of lines that I don´t need, so I´m trying to tell Powershell to read the content when line matches this pattern (see below). But it´s returning False and not doing what I´d like which is to copy all the content between the regex and the - sign which indicates where the block ends.
Any idea of what I´m doing wrong?
$InputFile = gc "D:\input_file.txt"
$Dest = "D:\Desktop\Final_file.txt"
#PATTERN I´M LOOKING FOR:
0000 00XKDPMBBRAXXX00000
1965 81PWSLKDTRUGXX00000
#REGEX I´VE CREATED BASED ON ABOVE CONTENT
$re = [regex]'(\d{4}\s\d{2}\[a-z]{12}\d{5})'
$file_line_num = 0
$mesg_line_num = 0
$Dest_count = 0
foreach ($line in $Input_File) {
$file_line_num = $file_line_num + 1
# Find where message starts, any other lines are ignored
if ($line -match $re) {
[void]$foreach.MoveNext() # skip lines not needed
$msg_line_num = 0
do {
[void]$foreach.MoveNext()
$line = $foreach.current
$msg_line_num = $msg_line_num + 1
if ($msg_line_num -lt 3) {
$header = $line.substring(7,8) + $line.substring(16, 3)
add-content $Dest $header
} else {
add-content $Dest $line
}
} until ($line -eq "-" -or $line -eq $null)
}
}
Exit
text
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
TEXTHERE TEXTHERE TEXTHERE
TEXTHERE
.TEXTHERE TEXTHERE TEXTHERE
TEXTHERE TEXTHERE
0000 00XKDPMBBRAXXX00000
1965 81PWSLKDTRUGXX00000
123 99
TEXTHERE
TEXTHERE//TEXTHERE
TEXTHERE
TEXTHERE
TEXTHERE
TEXTHERE//TEXTHERE
TEXTHERE//TEXTHERE
TEXTHERE TEXTHERE
TEXTHERE TEXTHERE
TEXTHERE TEXTHERE
-
=TEXTHERE TEXTHERE
=TEXTHERE TEXTHERE
NNNN++++++++++++++++++++++++++++++++++++
+ +
+ -- =TEXTHERE TEXTHERE +
+ =TEXTHERE TEXTHERE +
+ +
++++++++++++++++++++++++++++++++++++++++
TEXTHERE TEXTHERE TEXTHERE
TEXTHERE
.TEXTHERE TEXTHERE TEXTHERE
TEXTHERE TEXTHERE
0000 00XKDPMBBRAXXX00000
1965 81PWSLKDTRUGXX00000
123 99
TEXTHERE
TEXTHERE//TEXTHERE
TEXTHERE
TEXTHERE
TEXTHERE
TEXTHERE//TEXTHERE
TEXTHERE//TEXTHERE
TEXTHERE TEXTHERE
TEXTHERE TEXTHERE
TEXTHERE TEXTHERE
-
=TEXTHERE TEXTHERE
=TEXTHERE TEXTHERE
NNNN++++++++++++++++++++++++++++++++++++
+ +
+ -- =TEXTHERE TEXTHERE +
+ =TEXTHERE TEXTHERE +
+ +
++++++++++++++++++++++++++++++++++++++++
Upvotes: 0
Views: 894
Reputation: 73686
\[a-z]
should be [A-Z]
- the slash is not needed because it produces a literal [
, also [regex]
class is case-sensitive unlike the usual -match
operator.
Anyway, it's possible to shorten the code (PowerShell 3.0 and newer):
$all = ([regex]'(?s)(?<=(\d{4}\s\d{2}[a-zA-Z]{12}\d{5}\r?\n){2})(.*?)(?=\r?\n-\r?\n)').
Matches((Get-Content source.txt -raw)).Value
Or PowerShell 2.0:
$all = ([regex]'(?s)(?<=(\d{4}\s\d{2}[a-zA-Z]{12}\d{5}\r?\n){2})(.*?)(?=\r?\n-\r?\n)').
Matches([IO.File]::ReadAllText('r:\source.txt')) | Select -expand Value
To copy including the boundary lines too change the groups in the regexp:
'(?s)(?:\d{4}\s\d{2}[a-zA-Z]{12}\d{5}\r?\n){2}.*?\r?\n-\r?\n'
Upvotes: 3
Reputation:
> select-string .\input_file.txt -Pattern '(\d{4})\s(\d{2}[a-z]{12}\d{5})'
input_file.txt:8:0000 00XKDPMBBRAXXX00000
input_file.txt:9:1965 81PWSLKDTRUGXX00000
input_file.txt:38:0000 00XKDPMBBRAXXX00000
input_file.txt:39:1965 81PWSLKDTRUGXX00000
> select-string .\input_file.txt -Pattern '(\d{4})\s(\d{2}[a-z]{12}\d{5})'|%{$_.matches.captures.value}
0000 00XKDPMBBRAXXX00000
1965 81PWSLKDTRUGXX00000
0000 00XKDPMBBRAXXX00000
1965 81PWSLKDTRUGXX00000
> select-string .\input_file.txt -Pattern '(\d{4})\s(\d{2}[a-z]{12}\d{5})'|%{$_.matches.groups[1,2].value}
0000
00XKDPMBBRAXXX00000
1965
81PWSLKDTRUGXX00000
0000
00XKDPMBBRAXXX00000
1965
81PWSLKDTRUGXX00000
Upvotes: 0