Reputation: 9601
I've created a script which analyzes the debug logs from Windows DNS Server.
It does the following:
[System.IO.File]
classSteps 1 and 2 take the longest. In fact, they take a seemingly endless amount of time, because the file is growing as it is being read.
Due to the size of the debug log (80,000kb) it takes a very long time.
I believe that my code is fine for smaller text files, but it fails to deal with much larger files.
Here is my code: https://github.com/cetanu/msDnsStats/blob/master/msdnsStats.ps1
This is what the debug looks like (including the blank lines)
Multiply this by about 100,000,000 and you have my debug log.
21/03/2014 2:20:03 PM 0D0C PACKET 0000000005FCB280 UDP Rcv 202.90.34.177 3709 Q [1001 D NOERROR] A (2)up(13)massrelevance(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000042EB8B0 UDP Rcv 67.215.83.19 097f Q [0000 NOERROR] CNAME (15)manchesterunity(3)org(2)au(0)
21/03/2014 2:20:03 PM 0D0C PACKET 0000000003131170 UDP Rcv 62.36.4.166 a504 Q [0001 D NOERROR] A (3)ekt(4)user(7)net0319(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000089F1FD0 UDP Rcv 80.10.201.71 3e08 Q [1000 NOERROR] A (4)dns1(5)offis(3)com(2)au(0)
I need ways or ideas on how to open and read each line of a file more quickly than what I am doing now.
I am open to suggestions of using a different language.
Upvotes: 0
Views: 116
Reputation: 68273
I would trade this:
$dnslog = [System.IO.File]::Open("c:\dns.log","Open","Read","ReadWrite")
$dnslog_content = New-Object System.IO.StreamReader($dnslog)
For ($i=0;$i -lt $dnslog.length; $i++)
{
$line = $dnslog_content.readline()
if ($line -eq $null) { continue }
# REGEX MATCH EACH LINE OF LOGFILE
$pattern = $line | select-string -pattern $regex
# IGNORE EMPTY MATCH
if ($pattern -eq $null) {
continue
}
for this:
Get-Content 'c:\dns.log' -ReadCount 1000 |
ForEach-Object {
foreach ($line in $_)
{
if ($line -match $regex)
{
#Process matches
}
}
That will reduce then number of file read operations by a factor of 1000.
Trading the select-string operation will require re-factoring the rest of the code to work with $matches[n] instead of $pattern.matches[0].groups[$n].value, but is much faster. Select-String returns matchinfo objects which contain a lot of additional information about the match (line number, filename, etc.) which is great if you need it. If all you need is strings from the captures then it's wasted effort.
You're creating an object ($log), and then accumulating values into array properties:
$log.date += @($pattern.matches[0].groups[$n].value); $n++
that array addition is going to kill your performance. Also, hash table operations are faster than object property updates.
I'd create $log as a hash table first, and the key values as array lists:
$log = @{}
$log.date = New-Object collections.arraylist
Then inside your loop:
$log.date.add($matches[1]) > $nul)
Then create your object from $log after you've populated all of the array lists.
Upvotes: 1
Reputation: 24071
As a general piece of advise, use the Measure-Command
to find out which script blocks take the longest time.
That being said, the sleep process seems a bit weird. If I'm not in error, you sleep 20 ms after each row:
sleep -milliseconds 20
Multiply 20 ms with the log size, 100 million iterations, and you'll get quite a long total sleep time.
Try sleeping after some decent batch size. Try if 10 000 rows is good like so,
if($i % 10000 -eq 0) {
write-host -nonewline "."
start-sleep -milliseconds 20
}
Upvotes: 0