jascissom
jascissom

Reputation: 75

Faster alternative to Get-ChildItem -Recurse

I have a script that uses Get-ChildItem to find specific files in a directory. I then use two different SQL tables to compare those files with constraints and delete the files if they meet certain criteria.

Basically this is what happens:

-- For reference the -include $include variable is a unique id (string) used as the filename. I'm deleting all files similar to that name.

Example:

$include: 9d3aa8ee-e60e-4b4f-9cd0-6678f8a5549e*.*

Query table #1, put results in an array.
Query table #2, put results in an array.

~~~ Psuedo code ~~~

    foreach ($i in table #1) {
        foreach ($x in table #2) {

            if (constraints are met) {
                $files = Get-ChildItem -Path $path  -Recurse -include $include | foreach-object -process { $_.FullName }

                Delete the files
            }
        }
    }

My problem: There are approximately 14 million files on this server.
I've run the script on a test server with about 1.5 million files, and it takes almost two hours.

I tried to run this script on the live server, but after three days it still had not completed.

How can I do this?

Upvotes: 0

Views: 9344

Answers (4)

Jagadish G
Jagadish G

Reputation: 681

Well, I don't know what you mean by some constraints. But a couple of years back, I had written a cmdlet called Find-ChildItem which is an alternative to Get-ChildItem.

It has more options built-in such as delete files greater than some size and older than some time or delete only empty files. This might help you get rid of some additional loops and cmdlets from your script and thereby an increase in performance. You may want to give it a try.

You can get more details about this Find-ChildItem cmdlet on my blog, Unix / Linux find equivalent in Powershell Find-ChildItem Cmdlet .

Some of Find-ChildItem's options

  1. Find-ChildItem -Type f -Name ".*.exe"
  2. Find-ChildItem -Type f -Name ".c$" -Exec "Get-Content {} | Measure-Object -Line -Character -Word"
  3. Find-ChildItem -Type f -Empty
  4. Find-ChildItem -Type f -Empty -OutObject
  5. Find-ChildItem -Type f -Empty -Delete
  6. Find-ChildItem -Type f -Size +9M -Delete
  7. Find-ChildItem -Type d
  8. Find-ChildItem -Type f -Size +50m -WTime +5 -MaxDepth 1 -Delete

I hope this helps you a bit...

Upvotes: 0

marceljg
marceljg

Reputation: 1067

With 14 million files to work with, just how long does it take to find one such file?

You may simply be fighting with the I/O subsystem and the choice of script might not matter as much.

My suggestion is to baseline the single file removal to see if you can accomplish this task reasonably, or you may need to look at your hardware configuration.

Upvotes: 0

mjolinor
mjolinor

Reputation: 68303

For just getting the fullname strings from large directory structures, the legacy DIR command with the /B switch can be much faster:

cmd /c dir $path\9d3aa8ee-e60e-4b4f-9cd0-6678f8a5549e*.* /b /s /a-d

Upvotes: 1

Shay Levy
Shay Levy

Reputation: 126842

If I follow you, you're recursing over a huge directory for each file pattern you want to remove. If that's the case then I would find all patterns first and only then use a single Get-ChildItem call to remove the files.

$include = foreach( $i in table #1 ) 
{
    foreach( $x in table #2 ) 
    {    
       if(constraints are met) 
       {
           output file pattern
       }    

    }
}

Get-ChildItem -Path $path -Recurse -Include $include| Remove-Item -Force

Upvotes: 1

Related Questions