Reputation: 103
I´m getting started with powershell and my knowledge is very poor right now. I have this .log file which looks like the following:
18.7.2017 12:59:15 Starting thread: KEYWORD1
18.7.2017 12:59:33 Thread finished; ... KEYWORD1
18.7.2017 13:32:19 Starting thread: KEYWORD2
18.7.2017 13:34:8 Thread finished;... KEYWORD2
I want to find out now, if every thread that started, has also been finished. If there is an unfinished thread I want to compare the timestamp with the current time.
I thought a hashtable would do the trick and that is what i came up with:
foreach($line in Get-Content $sourceDirectory)
{
if($line -like "*Starting thread*")
{
$arrStart = $line -split ' '
$startThreads=$arrStart[$arrStart.Length-1]
$hashmap1 = @{$arrEnd[$arrEnd.Length-1] = $arrEnd[1]}
}
if($line -like "*Thread finished*")
{
$arrEnd = $line -split ' '
$hashmap2 = @{$arrEnd[$arrEnd.Length-1] = $arrEnd[1]}
$endThreads=($arrEnd[1]+" "+$arrEnd[$arrEnd.Length-1])
}
}
How is it possible to compare these two hashmaps now?
Upvotes: 3
Views: 1511
Reputation: 437648
JPBlanc recommends grouping the records in a comment on the question, and the Group-Object
cmdlet indeed offers a conceptually elegant solution:
Note: The assumption is that if a given keyword only has one entry, it is always the starting entry.
Select-String 'Starting thread:|Thread finished;' file.log |
Group-Object { (-split $_)[-1] } | Where-Object { $_.Count % 2 -eq 1 }
The Select-String
call extracts only the lines of interest (a thread starting, a thread finishing), using a regex (regular expression)
The Group-Object
call groups the resulting lines by the last ([-1]
) whitespace-separated token (-split ...
) on each line ($_
), i.e., the keywords.
Where-Object
then returns only those resulting that have an odd number of entries, i.e., those that aren't paired, representing the started-but-not-finished threads.
This yields something like the following:
Count Name Group
----- ---- -----
1 KEYWORD3 {/Users/jdoe/file.log:5:28.8.2018 08:59:16 Starting thread: KEYWORD3}
This is probably not the format you want, but given that the outputs are objects, as is typical in PowerShell, you can easily process them to your liking programmatically.
Technically, the above command outputs [Microsoft.PowerShell.Commands.GroupInfo]
instances whose .Group
property in this case contains [Microsoft.PowerShell.Commands.MatchInfo]
instances, as output by Select-String
.
The following code extends the one above to produce custom output that reports how much time has elapsed since each unfinished thread has started:
$now = Get-Date
Select-String 'Starting thread:|Thread finished;' file.log |
Group-Object { (-split $_)[-1] } | Where-Object { $_.Count % 2 -eq 1 } | ForEach-Object {
foreach ($matchInfo in $_.Group) { # loop over started-only lines
$tokens = -split $matchInfo.Line # split into tokens by whitespace
$date, $time = $tokens[0..1] # extract date and time (first 2 tokens)
$keyword = $tokens[-1] # extract keyword (last token)
# Parse date+time into a [datetime] instance.
# Note: Depending on the current culture, [datetime]::Parse("$date $time") may do.
$start = [datetime]::ParseExact("$date $time", 'd\.M\.yyyy HH:mm:ss', [cultureinfo]::InvariantCulture)
# Custom output string containing how long ago the thread was started:
"Thread $keyword hasn't finished yet; time elapsed since it started: " +
($now - $start).ToString('g')
}
}
This yields something like the following:
Thread KEYWORD3 hasn't finished yet; time elapsed since it started: 2:03:35.347563
2:03:35.347563
(2 hours, 3 minutes, ...) is the string representation of a [TimeSpan]
instance that is the result of subtracting two points in time ([datetime]
instances).
Upvotes: 2
Reputation: 8432
One way to do this is to use RegEx to pull each line apart, then create a new object from the details. For example:
Get-Content .\data.txt |
ForEach-Object {
if ($_ -match "^(?<time>(\d+\.){2}\d+ (\d{2}:){2}\d{2}).*(?<state>Starting|finished).*\b(?<keyword>\w+)$")
{
[PsCustomObject]@{
Keyword = $matches.keyword
Action = $(if($matches.state -eq "Starting"){"Start"}else{"Finish"})
Time = (Get-Date $matches.time)
}
}
}
Assume you have a log file (data.txt
) with the following content:
18.7.2017 12:59:15 Starting thread: KEYWORD1
18.7.2017 13:32:19 Starting thread: KEYWORD2
18.7.2017 12:59:15 Starting thread: KEYWORD3
18.7.2017 13:34:18 Thread finished;... KEYWORD2
18.7.2017 12:59:15 Starting thread: KEYWORD4
18.7.2017 13:34:18 Thread finished;... KEYWORD3
18.7.2017 12:59:15 Starting thread: KEYWORD5
18.7.2017 13:34:18 Thread finished;... KEYWORD5
Running the above code against it, gives output:
Keyword Action Time
------- ------ ----
KEYWORD1 Start 18/07/2017 12:59:15
KEYWORD2 Start 18/07/2017 13:32:19
KEYWORD3 Start 18/07/2017 12:59:15
KEYWORD2 Finish 18/07/2017 13:34:18
KEYWORD4 Start 18/07/2017 12:59:15
KEYWORD3 Finish 18/07/2017 13:34:18
KEYWORD5 Start 18/07/2017 12:59:15
KEYWORD5 Finish 18/07/2017 13:34:18
This isn't much of an improvement over the raw file, but now that you have some objects, you can more easily process them. For example, you can see which ones have no matching start/finish by appending the following after the last bracket:
| Group-Object Keyword -NoElement | Sort-Object Count -Descending
This gives output like this:
Count Name
----- ----
2 KEYWORD2
2 KEYWORD3
2 KEYWORD5
1 KEYWORD1
1 KEYWORD4
It is now easier to see which ones have a start/finish pair (e.g. have 2 items in each group)
This is probably a bit overkill for your scenario, but as you said you were new to PowerShell, I thought I'd mention it as it is often very useful to turn text into object like this for processing.
Upvotes: 1
Reputation: 10044
It looks like you are trying to make two hashtables, one for starting and one for finished. With the important information being the Keyword. Rather than making hashtables, since you really only need one piece of information, an array would be a better data type.
# Find Lines with `Starting thread` and drop everything before the final space to get the array of KEYWORDS that started
$Start = (Select-String $sourceDirectory 'Starting thread') -replace '^.*Starting thread.*\s+'
# Find Lines with `Thread finished` and drop everything before the final space to get the array of KEYWORDS that finished
$Finish = (Select-String $sourceDirectory 'Thread finished') -replace '^.*Thread finished.*\s+'
# Find everything that started but hasn't finished.
$Start.where({$_ -notin $Finish})
Notes: Requires PS4+ for where
method and -notin
. Also the assumption was made that a thread doesn't start and stop multiple times.
Upvotes: 1