Dr.YSG
Dr.YSG

Reputation: 7581

powershell script too slow (file enumeration)

I have a simple PowerShell script that runs through a directory tree, and lists the files in JSON format.

Each entry is of the form:

{id: filename, size: bytes }

Works fine for short listings, but very slow for large directories. I also want to write the contents to a file (manifest.json).

I am much better at writing C# .NET (I would use Directory.EnumerateFiles() )

But I thought I would see if I can't get simple things done easier in powershell.

But this script really bogs down when I get to 10K entries.

$src = "G:\wwwroot\BaseMaps\BigBlueMarble"
$path = $src + "\*"
$excludes = @("*.json", "*.ps1")
$version = "1.1"
Write-Host "{" 
Write-Host "`"manifest-version`": `"$version`","
Write-Host "`"files`": [" 

$dirs = Get-Item -Path $path -Exclude $excludes 
$dirs | Get-ChildItem -Recurse -File | % { 
    $fpath = $_.FullName.Replace($src, "").Replace("\","/")
    $date = $_.LastWriteTime
    $size = $_.Length
    $id = $_.BaseName
    Write-Host "{`"id`": `"$id`", `"size`": `"$size`"},"
    } 
Write-Host "]"
Write-Host "}"

Upvotes: 3

Views: 3145

Answers (3)

Dr.YSG
Dr.YSG

Reputation: 7581

Sometimes it might be better to just write utilities in C# and .NET. Using a very handy JSON.NET library, I put together a WPF application, that lets me select a folder (One of them has 100K PNG files) and then create the json "manifest" I tried above in less than 2 seconds. Here is the non-UI worker part of the application. Thanks for the tips above, they were helpful.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.IO;
using System.Windows;
using Newtonsoft.Json;

namespace Manifest
{
    internal class Worker
    {
        private DateTime start;
        private ViewModel vm;
        private readonly BackgroundWorker worker = new BackgroundWorker();
        private ManifestObject manifest;

        public Worker()
        {
            vm = ViewModel.myself;
            manifest = new ManifestObject();
            manifest.version = "1.1";
            manifest.files = new List<FileData>();
            worker.DoWork += build;
            worker.RunWorkerCompleted += done;
            worker.RunWorkerAsync();
        }

        public void build(object sender, DoWorkEventArgs e)
        {

            vm.Status = "Working...";
            start = DateTime.Now;
            scan();
        }

        private void scan()
        {
            var top = new DirectoryInfo(vm.FolderPath);
            try
            {
                foreach (var fi in top.EnumerateFiles("*" + vm.FileType, SearchOption.TopDirectoryOnly))
                {
                    FileData fd = new FileData();
                    fd.size = fi.Length;
                    fd.id = fi.Name.Replace(vm.FileType, "");
                    manifest.files.Add(fd);
                    vm.FileCount++;
                }
            }
            catch (UnauthorizedAccessException error)
                    {
                        MessageBox.Show("{0}", error.Message);
                    }
        }

        private void done(object sender,RunWorkerCompletedEventArgs e)
        {
            var done = DateTime.Now;
            var elapsed = done - start;
            vm.ElapsedTime = elapsed.ToString();
            vm.Status = "Done Scanning...";
            write();
        }

        private void write()
        {
            File.WriteAllText(vm.FolderPath + @"\manifest.json", JsonConvert.SerializeObject(manifest, Formatting.Indented));
            vm.Status = "Done";
        }
    }
}

Upvotes: 1

Χpẘ
Χpẘ

Reputation: 3451

On my system:

$pf = "C:\Program Files" # has about 50,000 files
measure-command {$a=[io.Directory]::EnumerateFiles($pf,"*","AllDirectories")|%{$_}}

was about twice as fast as:

measure-command {$a=gci "C:\Program Files" -Recurse}

The point being that you can use .NET classes very easily with Powershell AND they may work better.

In this case the get-childitem command has its own .NET class(es) to execute as well as invoking the file system provider class(es) which no doubt call something in [io.directory]. So while the powershell provider concept is pretty cool, it does add runtime overhead.

Upvotes: 1

alroc
alroc

Reputation: 28144

Get-ChildItem may be slowish (though it appears to be about twice as fast in PowerShell 3 as it was in v2), write-host is slowing you down a lot too. On a directory structure containing 27000+ files, the following code ran in 16.15 seconds vs 21.08 seconds for your code. On a smaller directory containing about 2400 files, it was 1.15s vs 1.22s.

gci $path -file -Recurse |
select @{name="fpath";expression={$_.fullname.replace($src,"").replace("\","/")}},lastwritetime,@{Name="size";Expression={$_.length}},@{Name="id";Expression={$_.basename}}|
select id,size|
ConvertTo-Json

The resulting JSON doesn't have the header yours does, but you should be able to handle that after the fact.

Upvotes: 2

Related Questions