His Royal Redness
His Royal Redness

Reputation: 781

Get powershell regex captures into a table

I'm trying to extract a set of data from some (large) text files. Basically, each line looks something like this:

2011-12-09 18:20:55, ABC.EXE[3b78], The rest of the line...

I'd like to get the date and the bit between the braces (the process id), and then compile a table. The second stage of the task is to group this table so that I get the earliest date for each process id, in effect giving me the date and time of the first log entry per process id which will hopefully approximate to the start time of that instance of the process.

What I've got so far (split onto different line for readability)

gci -filter *.log -r 
 | select-string '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}), ABC.EXE\[(.{4})' 
 | % { $_.matches } | % { $_.groups } | % { $_.value }

spits out the the captures. I'd like to ignore the first capture, and combine the second and third onto the same line.

Help? Please?

Edit: DOH! Can't answer my own question. So...

Ok, I think I'm on the right track. A SO question here helped me to get the individual parts I wanted, namely:

$_.matches[0].groups[1].value, $_.matches[0].groups[2].value

Then, an MSDN article here shows how to 'clump' the bits into an object, which allows it to be grouped / sorted / manipulated. Final result

gci -filter *.log | select-string '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}), ABC.EXE\[(.{4})' 
 | % { new-object object 
  | add-member NoteProperty Name $_.matches[0].groups[1].value -passthru 
  | add-member NoteProperty PId $_.matches[0].groups[2].value -passthru }

Quite messy, so if anyone knows of a cleaner way to do it, please let me know.

Upvotes: 4

Views: 2782

Answers (1)

Joey
Joey

Reputation: 354794

You can create new objects simpler in PowerShell v2 where the New-Object cmdlet supports a -Property parameter that receives a hashtable of properties:

New-Object PSObject -Property @{
    Name = $_.matches[0].groups[1].value
    PId = $_.matches[0].groups[2].value
}

Generally, I'd do the processing a little differently, though:

# prepare table
$data = $(switch -Regex -File filename {
    '^[^,]+' { $date = [datetime]$Matches[0] }
    '(?<=\[)[^\]]+' { $id = $Matches[0] }
    '$' { New-Object PSObject -Property @{
        Date = $date
        PId = $id
    } }
})

Using switch -regex has become a nice way (to me at least) to do quick-and-dirty parsers for text data. With -Regex all matching cases will be run, in this case all (so it's just a convenience to separate different parts of the matching). The first one grabs the date and time and stores it in a variable (even as a DateTime value); the second gets the process ID and the third, matching on the end of a line, puts it all together.

Just a personal preference, though; I have actually never used Select-String.

$data |
    group PId |
    foreach { New-Object PSObject -Property @{
        PId = $_.Name
        MinDate = @($_.Group | sort Date)[0].Date
    } }

This then uses the just-compiled data, groups it by process ID and outputs the ID with the minimum date for each one.

Note, this is more a "looks nice in code" approach. If the files you're dealing with are really large, you probably want something way more efficient.

Upvotes: 4

Related Questions