Manu
Manu

Reputation: 103

How can I compare a hashtable with another one using Powershell?

I´m getting started with powershell and my knowledge is very poor right now. I have this .log file which looks like the following:

18.7.2017 12:59:15  Starting thread: KEYWORD1
18.7.2017 12:59:33  Thread finished; ... KEYWORD1
18.7.2017 13:32:19  Starting thread: KEYWORD2
18.7.2017 13:34:8  Thread finished;... KEYWORD2

I want to find out now, if every thread that started, has also been finished. If there is an unfinished thread I want to compare the timestamp with the current time.

I thought a hashtable would do the trick and that is what i came up with:

foreach($line in Get-Content $sourceDirectory)
{
    if($line -like "*Starting thread*")
    {
        $arrStart = $line -split ' '
        $startThreads=$arrStart[$arrStart.Length-1]
        $hashmap1 = @{$arrEnd[$arrEnd.Length-1] = $arrEnd[1]}
    }

    if($line -like "*Thread finished*")
    {
        $arrEnd = $line -split ' '
        $hashmap2 = @{$arrEnd[$arrEnd.Length-1] = $arrEnd[1]}
        $endThreads=($arrEnd[1]+" "+$arrEnd[$arrEnd.Length-1])
    }
}

How is it possible to compare these two hashmaps now?

Upvotes: 3

Views: 1511

Answers (3)

mklement0
mklement0

Reputation: 437648

JPBlanc recommends grouping the records in a comment on the question, and the Group-Object cmdlet indeed offers a conceptually elegant solution:

Note: The assumption is that if a given keyword only has one entry, it is always the starting entry.

Select-String 'Starting thread:|Thread finished;' file.log | 
  Group-Object { (-split $_)[-1] } | Where-Object { $_.Count % 2 -eq 1 }
  • The Select-String call extracts only the lines of interest (a thread starting, a thread finishing), using a regex (regular expression)

  • The Group-Object call groups the resulting lines by the last ([-1]) whitespace-separated token (-split ...) on each line ($_), i.e., the keywords.

  • Where-Object then returns only those resulting that have an odd number of entries, i.e., those that aren't paired, representing the started-but-not-finished threads.

This yields something like the following:

Count Name          Group
----- ----          -----
    1 KEYWORD3      {/Users/jdoe/file.log:5:28.8.2018 08:59:16  Starting thread: KEYWORD3}

This is probably not the format you want, but given that the outputs are objects, as is typical in PowerShell, you can easily process them to your liking programmatically.

Technically, the above command outputs [Microsoft.PowerShell.Commands.GroupInfo] instances whose .Group property in this case contains [Microsoft.PowerShell.Commands.MatchInfo] instances, as output by Select-String.


The following code extends the one above to produce custom output that reports how much time has elapsed since each unfinished thread has started:

$now = Get-Date
Select-String 'Starting thread:|Thread finished;' file.log  | 
  Group-Object { (-split $_)[-1] } | Where-Object { $_.Count % 2 -eq 1 } | ForEach-Object {
    foreach ($matchInfo in $_.Group) { # loop over started-only lines
      $tokens = -split $matchInfo.Line # split into tokens by whitespace
      $date, $time = $tokens[0..1]     # extract date and time (first 2 tokens)
      $keyword = $tokens[-1]           # extract keyword (last token)
      # Parse date+time into a [datetime] instance.
      # Note: Depending on the current culture, [datetime]::Parse("$date $time") may do.
      $start = [datetime]::ParseExact("$date $time", 'd\.M\.yyyy HH:mm:ss', [cultureinfo]::InvariantCulture)
      # Custom output string containing how long ago the thread was started:
      "Thread $keyword hasn't finished yet; time elapsed since it started: " +
        ($now - $start).ToString('g')
    }
  }

This yields something like the following:

Thread KEYWORD3 hasn't finished yet; time elapsed since it started: 2:03:35.347563

2:03:35.347563 (2 hours, 3 minutes, ...) is the string representation of a [TimeSpan] instance that is the result of subtracting two points in time ([datetime] instances).

Upvotes: 2

boxdog
boxdog

Reputation: 8432

One way to do this is to use RegEx to pull each line apart, then create a new object from the details. For example:

Get-Content .\data.txt |
    ForEach-Object {
        if ($_ -match "^(?<time>(\d+\.){2}\d+ (\d{2}:){2}\d{2}).*(?<state>Starting|finished).*\b(?<keyword>\w+)$")
        {
            [PsCustomObject]@{
                Keyword = $matches.keyword
                Action = $(if($matches.state -eq "Starting"){"Start"}else{"Finish"})
                Time = (Get-Date $matches.time)
            }
        }
    }

Assume you have a log file (data.txt) with the following content:

18.7.2017 12:59:15  Starting thread: KEYWORD1
18.7.2017 13:32:19  Starting thread: KEYWORD2
18.7.2017 12:59:15  Starting thread: KEYWORD3
18.7.2017 13:34:18  Thread finished;... KEYWORD2
18.7.2017 12:59:15  Starting thread: KEYWORD4
18.7.2017 13:34:18  Thread finished;... KEYWORD3
18.7.2017 12:59:15  Starting thread: KEYWORD5
18.7.2017 13:34:18  Thread finished;... KEYWORD5

Running the above code against it, gives output:

Keyword  Action Time               
-------  ------ ----               
KEYWORD1 Start  18/07/2017 12:59:15
KEYWORD2 Start  18/07/2017 13:32:19
KEYWORD3 Start  18/07/2017 12:59:15
KEYWORD2 Finish 18/07/2017 13:34:18
KEYWORD4 Start  18/07/2017 12:59:15
KEYWORD3 Finish 18/07/2017 13:34:18
KEYWORD5 Start  18/07/2017 12:59:15
KEYWORD5 Finish 18/07/2017 13:34:18

This isn't much of an improvement over the raw file, but now that you have some objects, you can more easily process them. For example, you can see which ones have no matching start/finish by appending the following after the last bracket:

| Group-Object Keyword -NoElement | Sort-Object Count -Descending

This gives output like this:

Count Name                     
----- ----                     
    2 KEYWORD2                 
    2 KEYWORD3                 
    2 KEYWORD5                 
    1 KEYWORD1                 
    1 KEYWORD4  

It is now easier to see which ones have a start/finish pair (e.g. have 2 items in each group)

This is probably a bit overkill for your scenario, but as you said you were new to PowerShell, I thought I'd mention it as it is often very useful to turn text into object like this for processing.

Upvotes: 1

BenH
BenH

Reputation: 10044

It looks like you are trying to make two hashtables, one for starting and one for finished. With the important information being the Keyword. Rather than making hashtables, since you really only need one piece of information, an array would be a better data type.

# Find Lines with `Starting thread` and drop everything before the final space to get the array of KEYWORDS that started
$Start = (Select-String $sourceDirectory 'Starting thread') -replace '^.*Starting thread.*\s+'
# Find Lines with `Thread finished` and drop everything before the final space to get the array of KEYWORDS that finished
$Finish = (Select-String $sourceDirectory 'Thread finished') -replace '^.*Thread finished.*\s+'
# Find everything that started but hasn't finished.
$Start.where({$_ -notin $Finish})

Notes: Requires PS4+ for where method and -notin. Also the assumption was made that a thread doesn't start and stop multiple times.

Upvotes: 1

Related Questions