Bharath
Bharath

Reputation: 15

Extract text from a large file using powershell

We have an application that produces many large log files, which I want to parse using PowerShell and get the output in CSV or text with delimiter '|'. I tried to use select-string, but couldn't get the expected result. Below i have posted teh log format and expected result

Log File data:

How to achieve the above result using PowerShell?

Thanks

Upvotes: 0

Views: 933

Answers (2)

user6811411
user6811411

Reputation:

Just as mentioned in my comment you'll need to separate records and try to match your data with a complex regular expression.

See the RegEx live on regex101 Study the explanation of each element in the upper right corner of that link.

This script:

## Q:\Test\2018\11\29\SO_53541952.ps1

$LogFile = '.\SO_53541952.log'
$CsvFile = '.\SO_53541952.csv'
$ExcelFile='.\SO_53541952.xlsx'

## see the regex live <https://regex101.com/r/1TWm7i/1>
$RE = [RegEx]"(?sm)^Submitter Id +=> (?<SubmitterID>.*?$).*?^Start Time +=> (?<StartTime>[0-9:]{8}) +Start Date +=> (?<StartDate>[0-9\/]{10}).*?^Message Text +=> (?<MessageText>.*?$).*?^Src File +=> (?<SrcFile>.*?$).*?^Dest File +=> (?<DestFile>.*?$)"


$Data = (Get-Content $LogFile -raw) -split "(?sm)(?=^Record Id)" | ForEach-Object {
    If ($_ -match $RE){
        [PSCustomObject]@{
            'Submitter Id' = $Matches.SubmitterId
            'Start Time'   = $Matches.StartTime
            'Start Date'   = $Matches.StartDate
            'Message Text' = $Matches.MessageText
            'Src File'     = $Matches.SrcFile
            'Dest File'    = $Matches.DestFile
        }
    }
}
$Data | Format-Table -Auto
$Data | Export-Csv $CsvFile  -NoTypeInformation -Delimiter '|'

#$Data | Out-Gridview
## with the ImportExcel module you can directly generate an excel file
$Data | Export-Excel $ExcelFile -AutoSize # -Show

has this sample output on screen (I modified the samples to be distinguishable):

> .\SO_53541952.ps1

Submitter Id Start Time Start Date Message Text           Src File Dest File
------------ ---------- ---------- ------------           -------- ---------
STMDA@432... 00:02:51   11/29/2018 Copy step successfu... File1... c\temp...
STMDA@432... 00:02:52   11/29/2018 Copy step successfu... File2... c\temp...
STMDA@432... 00:02:53   11/29/2018 Copy step successfu... File3... c\temp...
STMDA@432... 00:02:54   11/29/2018 Copy step successfu... File4... c\temp...

and with Doug Finke's ImportExcel module installed, you'll directly get an .xlsx file:

enter image description here

Upvotes: 2

Theo
Theo

Reputation: 61253

As LotPings suggested, you need to break up the log file content into separate blocks. Then using regex, you can capture the required values and store them in objects which you can then export to a CSV file.

Something like this:

$log = @"
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4322
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File2
Dest File         => c\temp2
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4321
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File1
Dest File         => c\temp1
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4323
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File3
Dest File         => c\temp3
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4324
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File4
Dest File         => c\temp4
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
"@

# first break the log into 'Record Id' blocks
$blocks = @()
$regex = [regex] '(?m)(Record Id[^-]+)'
$match = $regex.Match($log)
while ($match.Success) {
    $blocks += $match.Value
    $match = $match.NextMatch()
} 

# next, parse out the required values for each block and create objects to export
$blocks | ForEach-Object {
    if ($_ -match '(?s)Submitter Id\s+=>\s+(?<submitter>[^\s]+).+Start Time\s+=>\s+(?<starttime>[^\s]+)\s+Start Date\s+=>\s+(?<startdate>[^\s]+).+Message Text\s+=>\s+(?<messagetext>[\w ,.;-_]+).+Src File\s+=>\s+(?<sourcefile>[\w ,.;-_]+).+Dest File\s+=>\s+(?<destinationfile>[\w ,.;-_]+)') {
        [PSCustomObject]@{
            'Submitter Id' = $matches['submitter']
            'Start Time'   = $matches['starttime']
            'Start Date'   = $matches['startdate']
            'Message Text' = $matches['messagetext']
            'Src File'     = $matches['sourcefile']
            'Dest File'    = $matches['destinationfile']
        }
    }
} | Export-Csv -Path '<PATH_TO_YOUR_OUTPUT_CSV>' -Delimiter '|' -NoTypeInformation

This will result in a csv file with the following content:

"Submitter Id"|"Start Time"|"Start Date"|"Message Text"|"Src File"|"Dest File"
"STMDA@4322"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File2"|"c\temp2"
"STMDA@4321"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File1"|"c\temp1"
"STMDA@4323"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File3"|"c\temp3"
"STMDA@4324"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File4"|"c\temp4"

Upvotes: 1

Related Questions