Reputation: 7581
I have a simple PowerShell script that runs through a directory tree, and lists the files in JSON format.
Each entry is of the form:
{id: filename, size: bytes }
Works fine for short listings, but very slow for large directories. I also want to write the contents to a file (manifest.json).
I am much better at writing C# .NET (I would use Directory.EnumerateFiles() )
But I thought I would see if I can't get simple things done easier in powershell.
But this script really bogs down when I get to 10K entries.
$src = "G:\wwwroot\BaseMaps\BigBlueMarble"
$path = $src + "\*"
$excludes = @("*.json", "*.ps1")
$version = "1.1"
Write-Host "{"
Write-Host "`"manifest-version`": `"$version`","
Write-Host "`"files`": ["
$dirs = Get-Item -Path $path -Exclude $excludes
$dirs | Get-ChildItem -Recurse -File | % {
$fpath = $_.FullName.Replace($src, "").Replace("\","/")
$date = $_.LastWriteTime
$size = $_.Length
$id = $_.BaseName
Write-Host "{`"id`": `"$id`", `"size`": `"$size`"},"
}
Write-Host "]"
Write-Host "}"
Upvotes: 3
Views: 3145
Reputation: 7581
Sometimes it might be better to just write utilities in C# and .NET. Using a very handy JSON.NET library, I put together a WPF application, that lets me select a folder (One of them has 100K PNG files) and then create the json "manifest" I tried above in less than 2 seconds. Here is the non-UI worker part of the application. Thanks for the tips above, they were helpful.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.IO;
using System.Windows;
using Newtonsoft.Json;
namespace Manifest
{
internal class Worker
{
private DateTime start;
private ViewModel vm;
private readonly BackgroundWorker worker = new BackgroundWorker();
private ManifestObject manifest;
public Worker()
{
vm = ViewModel.myself;
manifest = new ManifestObject();
manifest.version = "1.1";
manifest.files = new List<FileData>();
worker.DoWork += build;
worker.RunWorkerCompleted += done;
worker.RunWorkerAsync();
}
public void build(object sender, DoWorkEventArgs e)
{
vm.Status = "Working...";
start = DateTime.Now;
scan();
}
private void scan()
{
var top = new DirectoryInfo(vm.FolderPath);
try
{
foreach (var fi in top.EnumerateFiles("*" + vm.FileType, SearchOption.TopDirectoryOnly))
{
FileData fd = new FileData();
fd.size = fi.Length;
fd.id = fi.Name.Replace(vm.FileType, "");
manifest.files.Add(fd);
vm.FileCount++;
}
}
catch (UnauthorizedAccessException error)
{
MessageBox.Show("{0}", error.Message);
}
}
private void done(object sender,RunWorkerCompletedEventArgs e)
{
var done = DateTime.Now;
var elapsed = done - start;
vm.ElapsedTime = elapsed.ToString();
vm.Status = "Done Scanning...";
write();
}
private void write()
{
File.WriteAllText(vm.FolderPath + @"\manifest.json", JsonConvert.SerializeObject(manifest, Formatting.Indented));
vm.Status = "Done";
}
}
}
Upvotes: 1
Reputation: 3451
On my system:
$pf = "C:\Program Files" # has about 50,000 files
measure-command {$a=[io.Directory]::EnumerateFiles($pf,"*","AllDirectories")|%{$_}}
was about twice as fast as:
measure-command {$a=gci "C:\Program Files" -Recurse}
The point being that you can use .NET classes very easily with Powershell AND they may work better.
In this case the get-childitem command has its own .NET class(es) to execute as well as invoking the file system provider class(es) which no doubt call something in [io.directory]. So while the powershell provider concept is pretty cool, it does add runtime overhead.
Upvotes: 1
Reputation: 28144
Get-ChildItem
may be slowish (though it appears to be about twice as fast in PowerShell 3 as it was in v2), write-host
is slowing you down a lot too. On a directory structure containing 27000+ files, the following code ran in 16.15 seconds vs 21.08 seconds for your code. On a smaller directory containing about 2400 files, it was 1.15s vs 1.22s.
gci $path -file -Recurse |
select @{name="fpath";expression={$_.fullname.replace($src,"").replace("\","/")}},lastwritetime,@{Name="size";Expression={$_.length}},@{Name="id";Expression={$_.basename}}|
select id,size|
ConvertTo-Json
The resulting JSON doesn't have the header yours does, but you should be able to handle that after the fact.
Upvotes: 2