jjb
jjb

Reputation: 37

How can I make my powershell script for parsing faster?

I have this PowerShell script to parse many text files at once (about 1 MB) that look like configuration files:

Script:

$counter = ($false,0,0)
$objcounter = 0
$global:files = [ordered]@{}
$txt = [System.IO.File]::ReadAllLines($opath)

foreach($line in $txt){
    if ($counter[2] -eq "spline"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="spline"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "object"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="sceneryobject"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "splineh"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="splineh"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "attachedobject"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="attachedobject"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "splineattachement"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="splineattachement"}}};$counter = ($false,0,0)}}

    if ($line -eq "[spline]"){
        $counter = @($true,1,"spline");$objcounter++} 
    if ($line -eq "[splineh]"){
        $counter = @($true,1,"object");$objcounter++}
    if ($line -eq "[object]"){
        $counter = @($true,1,"object");$objcounter++}
    if ($line -eq "[attachObj]"){
        $counter = @($true,1,"attachedobject");$objcounter++}
    if ($line -eq "[splineAttachement]"){
        $counter = @($true,1,"splineattachement");$objcounter++}
}

(I know that it isn't well-structured.)

File:

[spline]
0
apath\path\file3.ext
8947
8946
8992
0.0584106412565594
0.250000081976033
195.973568100565
90.0000020235813
39.99999937227
0
0
0
0
0
0
0
180.853118555128


[spline_h]
0
apath\path\file2.ext
8949
8948
9022
0.0565795901830857
0.250000202235118
202.972286028874
90.0000020235813
39.99999937227
0
0
0
0
0
0
0
183.907441598005
mirror

[spline]
0
apath\path\file.ext
8951
0
9019
0.0585327145350332
0.0999999434550936
201.971026072961
90.0000020235813
39.99999937227
0
0
0
0
0
0
0
183.47110728047
mirror

(and so on…)

The script works fine, but it takes very long to parse the files and after a while I get “no response” and the app crashes.

This is the output that I need:

$global:files = [ordered]@{path=@{path="path";type="type"}}

Where 'path' is the file path, like: apath\path\file.ext and 'type' is the mesh type, like: spline or spline_h.

What can I change to make the parsing faster?

Upvotes: 1

Views: 209

Answers (1)

Santiago Squarzon
Santiago Squarzon

Reputation: 61128

Here is an example of how it could be improved using regex mainly and some string manipulation. Note that I'm nowhere near good with it and I'm quite sure it could be improved greatly but as is, it's working for me.

It was not clear for me what should happen whenever there are two or more types ([keyword]) with the same path (path being the hashtable key). Right now the code is assuming there will not be duplicated paths on the file.

For the regex explanation see: https://regex101.com/r/aN4WNR/1
NOTE: This only works because the paths end with .ext, if that was not the case, you should clarify that too.

The regex is expecting a multi-line string to work properly, hence you would need to use either one of these (which will also improve the efficiency of the script).

  • Get-Content -Raw
  • [System.IO.File]::ReadAllText(...)
$txt = Get-Content -Raw ./test.txt
$re = [regex]::Matches($txt, '(?ms)\b(?<=(\[)).*?\.ext\b')
$result = [ordered]@{}

foreach($r in $re)
{
    $parse = $r.Value -split '\r?\n'
    $type = $parse[0].Replace(']','')
    $path = $parse[-1]

    $result.Add(
        $path,
        [ordered]@{
            path = $path
            type = $type
        }
    )
}

Result:

PS /> $result

Name                           Value
----                           -----
apath\path\file3.ext           {path, type}
apath\path\file2.ext           {path, type}
apath\path\file.ext            {path, type}

PS /> $result['apath\path\file2.ext']

Name                           Value
----                           -----
path                           apath\path\file2.ext
type                           spline_h

Upvotes: 1

Related Questions