owheel
owheel

Reputation: 43

Extract file names from a File Explorer search into Excel

This has been bugging me for while as I feel I have few pieces of the puzzle but I cant put them all together

So my goal is to be able to search all .pdfs in a given location for a keyword or phrase within the content of the files, not the filename, and then use the results of the search to populate an excel spreadsheet.

Before we start, I know that this easy to do using the Acrobat Pro API, but my company are not going to pay for licences for everyone so that this one macro will work.

The windows file explorer search accepts advanced query syntax and will search inside the contents of files assuming that the correct ifilters are enabled. E.g. if you have a word document called doc1.docx and the text inside the document reads "blahblahblah", and you search for "blah" doc1.docx will appear as the result. As far as I know, this cannot be acheived using the FileSystemObject, but if someone could confirm either way that would be really useful?

I have a simple code that opens an explorer window and searches for a string within the contents of all files in the given location. Once the search has completed I have an explorer window with all the files required listed. How do I take this list and populate an excel with the filenames of these files?

dim eSearch As String
eSearch = "explorer " & Chr$(34) & "search-ms://query=System.Generic.String:" & [search term here] & "&crumb=location:" & [Directory Here] & Chr$(34)
Call Shell (eSearch)

Upvotes: 4

Views: 1296

Answers (3)

Keith Miller
Keith Miller

Reputation: 802

I've forgotten everything I ever knew about VBA, but recently stumbled across an easy way to execute Explorer searches using the Shell.Application COM object. My code is PowerShell, but the COM objects & methods are what's critical. Surely someone here can translate.

This has what I think are several advantages:

  1. The query text is identical to what you wouold type in the Search Bar in Explorer, e.g.'Ext:pdf Content:compressor'

  2. It's easily launched from code and results are easily extracted with code, but SearchResults window is available for visual inspection/review.

  3. With looping & pauses, you can execute a series of searches in the same window.

I think this ability has been sitting there forever, but the MS documentation of the Document object & FilterView method make no mention of how they apply to File Explorer.

I hope others find this useful.

$FolderToSearch = 'c:\Path\To\Folder'
$SearchBoxText  = 'ext:pdf Content:compressor'

$Shell = New-Object -ComObject shell.application

###  Get handles of currenlty open Explorer Windows
$CurrentWindows = ( $Shell.Windows() | Where FullName -match 'explorer.exe$' ).HWND

$WinCount = $Shell.Windows().Count
$Shell.Open( $FolderToSearch )

Do { Sleep -m 50 } Until ( $Shell.Windows().Count -gt $WinCount )

$WindowToSerch = ( $Shell.Windows() | Where FullName -match 'explorer.exe$' ) | Where { $_.HWND -notIn $CurrentWindows }

$WindowToSearch.Document.FilterView( $SearchBoxText )

Do { Sleep -m 50 } Until ( $WindowToSearch.ReadyState -eq 4 )

### Fully-qualified name:
$FoundFiles = ( $WindowToSearch.Document.Folder.Items() ).Path
### or just the filename:
$FoundFiles = ( $WindowToSearch.Document.Folder.Items() ).Name

### $FoundFIles is an array of strings containing the names.
### The Excel portion I leave to you! :D

Upvotes: 0

FaneDuru
FaneDuru

Reputation: 42236

Try using the next function, please:

Function GetFilteredFiles(foldPath As String) As Collection
    'If using a reference to `Microsoft Internet Controls (ShDocVW.dll)_____________________
    'uncomment the next 2 lines and comment the following three (without any reference part)
    'Dim ExpWin As SHDocVw.ShellWindows, CurrWin As SHDocVw.InternetExplorer
    'Set ExpWin = New SHDocVw.ShellWindows
    '_______________________________________________________________________________________
    'Without any reference:_____________________________________
    Dim ExpWin As Object, CurrWin As Object, objshell As Object
    Set objshell = CreateObject("Shell.Application")
    Set ExpWin = objshell.Windows
    '___________________________________________________________
    Dim Result As New Collection, oFolderItems As Object, i As Long

    Dim CurrSelFile As String
    For Each CurrWin In ExpWin
        If Not CurrWin.Document Is Nothing Then
            If Not CurrWin.Document.FocusedItem Is Nothing Then
                If left(CurrWin.Document.FocusedItem.Path, _
                    InStrRev(CurrWin.Document.FocusedItem.Path, "\")) = foldPath Then
                
                    Set oFolderItems = CurrWin.Document.folder.Items
                    For i = 0 To oFolderItems.count
                        On Error Resume Next
                          If Err.Number <> 0 Then
                             Err.Clear: On Error GoTo 0
                          Else
                             Result.Add oFolderItems.item(CLng(i)).Name
                             On Error GoTo 0
                          End If
                    Next
                End If
            End If
        End If
    Next CurrWin
    Set GetFilteredFiles = Result
End Function

Like it is, the function works without any reference...

The above function must be called after you executed the search query in your existing code. It can be called in the next (testing) way:

Sub testGetFilteredFiles()
  Dim C As Collection, El As Variant
  Set C = GetFilteredFiles("C:\Teste VBA Excel\")'use here the folder path you used for searching
  For Each El In C
    Debug.Print El
  Next
End Sub

The above solution iterates between all IExplorer windows and return what is visible there (after filtering) for the folder you initially used to search.

You can manually test it, searching for something in a specific folder and then call the function with that specific folder path as argument ("\" backslash at the end...).

Upvotes: 1

Alex K.
Alex K.

Reputation: 175766

Assuming the location is indexed you can access the catalog directly with ADO (add a reference to Microsoft ActiveX Data Objects 2.x):

Dim cn  As New ADODB.Connection
Dim rs  As New ADODB.Recordset
Dim sql As String

cn.Open "Provider=Search.CollatorDSO;Extended Properties='Application=Windows'"

sql = "SELECT System.ItemNameDisplay, System.ItemPathDisplay FROM SystemIndex WHERE SCOPE='file:C:\look\here' AND System.Kind <> 'folder' AND CONTAINS(System.FileName, '""*.PDF""') AND CONTAINS ('""find this text""')"

rs.Open sql, cn, adOpenForwardOnly, adLockReadOnly

If Not rs.EOF Then
    Do While Not rs.EOF
        Debug.Print "File: "; rs.Collect(0)
        Debug.Print "Path: "; rs.Collect(1)
        rs.MoveNext
    Loop
End If

Upvotes: 5

Related Questions