Reputation: 1
I'm trying to read a file and match multiple lines using regex, but running into some issues. The file I'm trying to read looks like:
I 09/07/20 05:55PM [Backup Set] Starting backup to CrashPlan Central: 122 files (93.30MB) to back up
I 09/07/20 06:00PM [Backup Set] Completed backup to CrashPlan Central in 0h:04m:39s: 147 files (197.90MB) backed up, 5.30MB encrypted and sent @ 323.5Kbps (Effective rate: 2.7Mbps)
I 09/07/20 06:00PM - Unable to backup 1 file (next attempt within 15 minutes)
I 09/07/20 06:15PM [Backup Set] Starting backup to CrashPlan Central: 27 files (250MB) to back up
I 09/07/20 06:19PM [Backup Set] Completed backup to CrashPlan Central in 0h:04m:03s: 28 files (250MB) backed up, 5MB encrypted and sent @ 302.5Kbps (Effective rate: 4.3Mbps)
I 09/07/20 06:34PM [Backup Set] Starting backup to CrashPlan Central: 18 files (169.30KB) to back up
Lines appear to end in CR LF
. Ultimately I'd like find every line containing "Completed backup to" which are not followed immediately by lines containing "Unable to backup". However, I'm having trouble with even the simplest of queries.
Here's how I'm pulling in the text:
PS C:\temp> $rawtext = Get-Content '.\new 1.txt' -raw
PS C:\temp> $rawtext.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
PS C:\temp> $rawtext | Measure-Object -Line
Lines Words Characters Property
----- ----- ---------- --------
6
And the results of some simple regex queries:
PS C:\temp> Select-String -InputObject $rawtext -pattern '^.*Completed.*$' # returns nothing
PS C:\temp> Select-String -InputObject $rawtext -pattern '(?m)^.*Completed.*$' # returns the entire contents of $rawtext
PS C:\temp> Select-String -InputObject $rawtext -pattern '(?ms)^.*Completed.*$' # also returns the entire contents of $rawtext
PS C:\temp> Select-String -InputObject $rawtext -pattern '(?ms)^.*Completed.*\r\n$' # returns nothing
PS C:\temp> Select-String -InputObject $rawtext -pattern '(?ms)^.*Completed.*\r\n' # returns the entire contents of $rawtext
I would expect at least one of those queries to return every line that contains "Completed". But apparently Powershell isn't processing multiple lines the way I'm assuming it would. Can anybody shed some light onto how to process multi-line regex within Powershell?
FWIW, the following command successfully gets what I want in OSX terminal, and is essentially what I'd like to replicate in PoSH:
completedBackups=$(sed '/Completed[[:space:]]backup[[:space:]]to/!d;$!N;/\n.*Unable[[:space:]]to[[:space:]]backup[[:space:]]/!P;D' $f)
Upvotes: 0
Views: 95
Reputation: 16086
Why not just do this...
# Create the data file
'
I 09/07/20 05:55PM [Backup Set] Starting backup to CrashPlan Central: 122 files (93.30MB) to back up
I 09/07/20 06:00PM [Backup Set] Completed backup to CrashPlan Central in 0h:04m:39s: 147 files (197.90MB) backed up, 5.30MB encrypted and sent @ 323.5Kbps (Effective rate: 2.7Mbps)
I 09/07/20 06:00PM - Unable to backup 1 file (next attempt within 15 minutes)
I 09/07/20 06:15PM [Backup Set] Starting backup to CrashPlan Central: 27 files (250MB) to back up
I 09/07/20 06:19PM [Backup Set] Completed backup to CrashPlan Central in 0h:04m:03s: 28 files (250MB) backed up, 5MB encrypted and sent @ 302.5Kbps (Effective rate: 4.3Mbps)
I 09/07/20 06:34PM [Backup Set] Starting backup to CrashPlan Central: 18 files (169.30KB) to back up
' |
Out-File -FilePath 'D:\Temp\BackUpLog.txt'
(Get-Content -Path 'D:\temp\BackUpLog.txt').GetType()
# Results
<#
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
#>
((Get-Content -Path 'D:\temp\BackUpLog.txt') |
Measure-Object -Line).Lines
# Results
<#
6
#>
(Get-Content -Path 'D:\temp\BackUpLog.txt' -Raw).GetType()
# Results
<#
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
#>
((Get-Content -Path 'D:\temp\BackUpLog.txt' -Raw) |
Measure-Object -Line).Lines
# Results
<#
6
#>
# Use Select-String with pattern and -AllMatches
(Get-Content -Path 'D:\temp\BackUpLog.txt').Split([Environment]::NewLine) |
Select-String -Pattern 'Completed backup to' -AllMatches
# Use RegEx matches to collect specific strings
(Get-Content -Path 'D:\temp\BackUpLog.txt').Split([Environment]::NewLine) -match 'Completed backup to'
# Results of both are
<#
I 09/07/20 06:00PM [Backup Set] Completed backup to CrashPlan Central in 0h:04m:39s: 147 files (197.90MB) backed up, 5.30MB encrypted and sent @ 323.5Kbps (Effective rate: 2.7Mbps)
I 09/07/20 06:19PM [Backup Set] Completed backup to CrashPlan Central in 0h:04m:03s: 28 files (250MB) backed up, 5MB encrypted and sent @ 302.5Kbps (Effective rate: 4.3Mbps)
#>
Upvotes: 0
Reputation: 25001
You can do the following:
$rawtext = Get-Content '.\new 1.txt' -Raw
$rawtext | Select-String -Pattern '(?m)^.*?Completed backup to.*$(?!\r?\n.*Unable to backup)' -AllMatches |
Foreach-Object {$_.Matches.Value}
Explanation:
(?m)
is multi-line mode, which allows ^
and $
to match the beginning and end of each line.
(?!)
is a negative lookahead that does not consume any characters. So we are looking ahead from the end of the string $
to not find zero or more carriage returns \r?
and a line feed \n
followed by any characters .*
(on a single line since we aren't using (?s)
) and unable to backup
.
The -AllMatches
switch directs the command to keep matching after the first successful match.
Using the -Raw
switch is good since it will allow us to easily peek ahead to the next line of text. Without -Raw
, we would need to track previous lines piped into Select-String
. It is doable but a different approach.
(?s)
or single-line mode causes some problems here when using .
to match characters. .
will match newline characters in single-line mode.
Since Select-String
returns MatchInfo
objects, you will need to access the Value
property of the object's Matches
property for the actual matched line.
Upvotes: 1