elstiv
elstiv

Reputation: 377

Command line utility to copy files based on their content and date

I have a list of more than 10,000 excel files in location A.

I found a free program called Ultrafile Search which managed to identify the files but it does not run with parameters etc so i cannot use it in the scheduler. I also tried WinGrep which seemingly does what I require but it keeps on freezing all the time probably due to the amount of files. A command-line utility or a batch file script would be great. Thanks

Upvotes: 1

Views: 1251

Answers (2)

MC ND
MC ND

Reputation: 70923

Just a basic skeleton. Adapt as needed.

This is a hybrid batch/jscript file (save as .cmd). It uses robocopy to first select the files in the indicated date range. The generated list is passed to the javascript part to filter for content of the excel file. The javascript acts as a filter and echoes only the file names with the indicated string inside. This list is retrieved/processed by a for command (in the sample it simply echoes the file name to console)

Of course, for it to work, it is necessary that excel is installed.

@if (@This==@IsBatch) @then
@echo off
rem **** batch zone *********************************************************
    setlocal enableextensions disabledelayedexpansion

    set "sourceFolder=%cd%"
    set "dateStart=20140709"
    set "searchString=testing"

    set "dateFilter=robocopy "%sourceFolder%" "%temp%" *.xls /l /njh /njs /ndl /nc /ns /maxage:%dateStart% "
    set "contentFilter=cscript //nologo //e:Javascript "%~f0" /search:"%searchString%" "

    for /f "delims=" %%a in (' %dateFilter% ^| %contentFilter% ') do (
        echo Selected file : [%%a]
    )

    endlocal
    exit /b

@end
// **** Javascript zone *****************************************************

    // retrieve the string to search for
    var searchFor = WScript.Arguments.Named('search');

    // instantiate needed components
    var fso = new ActiveXObject('Scripting.FileSystemObject');
    var excel = new ActiveXObject('Excel.Application');

    // iterate over the stdin reading file names to process
    while ( ! WScript.StdIn.AtEndOfStream ){
        var fileName = WScript.StdIn.ReadLine();
        fileName = fileName.replace(/^\s*/g,'');
        fileName = fso.GetAbsolutePathName(fileName);
        if (fso.FileExists( fileName )){
            if (excelDataSearch( excel, fileName, searchFor )) {
                WScript.StdOut.WriteLine( fileName );
            };
        };
    };

    // if not needed, close the instantiated excel application
    if ( excel.WorkBooks.Count === 0 ) excel.Quit();

    // leave
    WScript.Quit(0);


function excelDataSearch( excel, workbook, searchString ){
    var returnValue = false ;

    try {
        // search for workbook application
        //var wb = GetObject( workbook );
        var wb = excel.Workbooks.Open( workbook, false, true );

        // iterate the worksheets of the workbook searching the string in its data
        var eWS = new Enumerator(wb.WorkSheets);
        for (; !eWS.atEnd() ; eWS.moveNext()){
            var ws = eWS.item();
            if ( ws.UsedRange.Cells.Find(searchString) ){
                returnValue = true ;
                break;
            };
        };

        // close workbook
        wb.Close( false );

    } catch(e){
            // some kind of problem with objects
            // WScript.Echo( e.description );
    };

    return returnValue;
};

Upvotes: 0

brettdj
brettdj

Reputation: 55682

PowerShell solution that

  • finds all excel files under c:\temp recursively
  • checks if they have been used since 1/1/2010
  • checks cell A1 in sheet 1 equals a variable str,

and if so copies the file to a new directory (c:\test)

$newpath = "c:\test"
$str = "Test"
$xl = New-Object -comobject Excel.Application
$xl.visible = $false

$excelSheets = Get-ChildItem c:\temp -recurse -include *.xls*  | where-object {$_.lastwritetime -gt “1/1/2010}

foreach($excelSheet in $excelSheets)
{
$workbook = $xl.Workbooks.Open($excelSheet)
$ws = $workbook.workSheets.Item(1)
$strxl = $ws.Cells.Item(1,1).Value2
$workbook.close()
if ($strxl -eq $str) {Copy-Item $excelsheet $newpath}
}#end foreach
$xl.quit()
$xl = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()

Upvotes: 1

Related Questions