Reputation: 43
This has been bugging me for while as I feel I have few pieces of the puzzle but I cant put them all together
So my goal is to be able to search all .pdfs in a given location for a keyword or phrase within the content of the files, not the filename, and then use the results of the search to populate an excel spreadsheet.
Before we start, I know that this easy to do using the Acrobat Pro API, but my company are not going to pay for licences for everyone so that this one macro will work.
The windows file explorer search accepts advanced query syntax and will search inside the contents of files assuming that the correct ifilters are enabled. E.g. if you have a word document called doc1.docx and the text inside the document reads "blahblahblah", and you search for "blah" doc1.docx will appear as the result. As far as I know, this cannot be acheived using the FileSystemObject, but if someone could confirm either way that would be really useful?
I have a simple code that opens an explorer window and searches for a string within the contents of all files in the given location. Once the search has completed I have an explorer window with all the files required listed. How do I take this list and populate an excel with the filenames of these files?
dim eSearch As String
eSearch = "explorer " & Chr$(34) & "search-ms://query=System.Generic.String:" & [search term here] & "&crumb=location:" & [Directory Here] & Chr$(34)
Call Shell (eSearch)
Upvotes: 4
Views: 1296
Reputation: 802
I've forgotten everything I ever knew about VBA, but recently stumbled across an easy way to execute Explorer searches using the Shell.Application COM object. My code is PowerShell, but the COM objects & methods are what's critical. Surely someone here can translate.
This has what I think are several advantages:
The query text is identical to what you wouold type in the Search Bar in Explorer, e.g.'Ext:pdf Content:compressor
'
It's easily launched from code and results are easily extracted with code, but SearchResults
window is available for visual inspection/review.
With looping & pauses, you can execute a series of searches in the same window.
I think this ability has been sitting there forever, but the MS documentation of the Document
object & FilterView
method make no mention of how they apply to File Explorer.
I hope others find this useful.
$FolderToSearch = 'c:\Path\To\Folder'
$SearchBoxText = 'ext:pdf Content:compressor'
$Shell = New-Object -ComObject shell.application
### Get handles of currenlty open Explorer Windows
$CurrentWindows = ( $Shell.Windows() | Where FullName -match 'explorer.exe$' ).HWND
$WinCount = $Shell.Windows().Count
$Shell.Open( $FolderToSearch )
Do { Sleep -m 50 } Until ( $Shell.Windows().Count -gt $WinCount )
$WindowToSerch = ( $Shell.Windows() | Where FullName -match 'explorer.exe$' ) | Where { $_.HWND -notIn $CurrentWindows }
$WindowToSearch.Document.FilterView( $SearchBoxText )
Do { Sleep -m 50 } Until ( $WindowToSearch.ReadyState -eq 4 )
### Fully-qualified name:
$FoundFiles = ( $WindowToSearch.Document.Folder.Items() ).Path
### or just the filename:
$FoundFiles = ( $WindowToSearch.Document.Folder.Items() ).Name
### $FoundFIles is an array of strings containing the names.
### The Excel portion I leave to you! :D
Upvotes: 0
Reputation: 42236
Try using the next function, please:
Function GetFilteredFiles(foldPath As String) As Collection
'If using a reference to `Microsoft Internet Controls (ShDocVW.dll)_____________________
'uncomment the next 2 lines and comment the following three (without any reference part)
'Dim ExpWin As SHDocVw.ShellWindows, CurrWin As SHDocVw.InternetExplorer
'Set ExpWin = New SHDocVw.ShellWindows
'_______________________________________________________________________________________
'Without any reference:_____________________________________
Dim ExpWin As Object, CurrWin As Object, objshell As Object
Set objshell = CreateObject("Shell.Application")
Set ExpWin = objshell.Windows
'___________________________________________________________
Dim Result As New Collection, oFolderItems As Object, i As Long
Dim CurrSelFile As String
For Each CurrWin In ExpWin
If Not CurrWin.Document Is Nothing Then
If Not CurrWin.Document.FocusedItem Is Nothing Then
If left(CurrWin.Document.FocusedItem.Path, _
InStrRev(CurrWin.Document.FocusedItem.Path, "\")) = foldPath Then
Set oFolderItems = CurrWin.Document.folder.Items
For i = 0 To oFolderItems.count
On Error Resume Next
If Err.Number <> 0 Then
Err.Clear: On Error GoTo 0
Else
Result.Add oFolderItems.item(CLng(i)).Name
On Error GoTo 0
End If
Next
End If
End If
End If
Next CurrWin
Set GetFilteredFiles = Result
End Function
Like it is, the function works without any reference...
The above function must be called after you executed the search query in your existing code. It can be called in the next (testing) way:
Sub testGetFilteredFiles()
Dim C As Collection, El As Variant
Set C = GetFilteredFiles("C:\Teste VBA Excel\")'use here the folder path you used for searching
For Each El In C
Debug.Print El
Next
End Sub
The above solution iterates between all IExplorer
windows and return what is visible there (after filtering) for the folder you initially used to search.
You can manually test it, searching for something in a specific folder and then call the function with that specific folder path as argument ("\"
backslash at the end...).
Upvotes: 1
Reputation: 175766
Assuming the location is indexed you can access the catalog directly with ADO (add a reference to Microsoft ActiveX Data Objects 2.x):
Dim cn As New ADODB.Connection
Dim rs As New ADODB.Recordset
Dim sql As String
cn.Open "Provider=Search.CollatorDSO;Extended Properties='Application=Windows'"
sql = "SELECT System.ItemNameDisplay, System.ItemPathDisplay FROM SystemIndex WHERE SCOPE='file:C:\look\here' AND System.Kind <> 'folder' AND CONTAINS(System.FileName, '""*.PDF""') AND CONTAINS ('""find this text""')"
rs.Open sql, cn, adOpenForwardOnly, adLockReadOnly
If Not rs.EOF Then
Do While Not rs.EOF
Debug.Print "File: "; rs.Collect(0)
Debug.Print "Path: "; rs.Collect(1)
rs.MoveNext
Loop
End If
Upvotes: 5