Reputation: 35
I'm new in VBA. Before posting my question here,I have spent almost 3 days surfing Internet.
I have 300+ text files (text converted from PDF using OCR),from text file. I need to get all words that contain "alphabet" and "digits" (as example KT315A, KT-315-a, etc) along with source reference (txt file name).
What I need is
1.add "smart filter" that will copy only words that contains
"alphabets" and "digits"
paste copied data to column A
add reference file name to column B
I have found code below that can copy all data from text files into excel spreadsheet.
text files look like
"line from 252A-552A to ddddd, ,,, @,@,rrrr, 22 , ....kt3443 , fff,,,etc"
final result in xls should be
A | B
252A-552A | file1
kt3443 | file1
Option Explicit
Const sPath = "C:\outp\" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab 'for TAB delimited text files
Sub ImportMultipleTextFiles()
Dim wb As Workbook
Dim sFile As String
Dim inputRow As Long
RefreshSheet
On Error Resume Next
sFile = Dir(sPath & "*.txt")
Do Until sFile = ""
inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
'open the text file
'format=6 denotes a text file
Set wb = Workbooks.Open(Filename:=sPath & sFile, _
Format:=6, _
Delimiter:=delim)
'copy and paste
wb.Sheets(1).Range("A1").CurrentRegion.Copy _
Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
wb.Close SaveChanges:=False
'get next text file
sFile = Dir()
Loop
Set wb = Nothing
End Sub
Sub RefreshSheet()
'delete old sheet and add a new one
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Temp").Delete
Application.DisplayAlerts = True
Worksheets.Add
ActiveSheet.Name = "Temp"
On Error GoTo 0
End Sub
thanks!
Upvotes: 0
Views: 1633
Reputation: 60474
It's a little tough to tell exactly what constitutes a word from your example. It clearly can contain characters other than letters and numbers (eg the dash), but some of the items have dots preceding, so it cannot be defined as being delimited by a space
.
I defined a "word" as a string that
To do this, I first replaced all the commas with spaces, and then applied an appropriate regular expression. However, this might accept undesired strings, so you might need to be more specific in defining exactly what is a word.
Also, instead of reading the entire file into an Excel workbook, by using the FileSystemObject
we can process one line at a time, without reading 300 files into Excel. The base folder is set, as you did, by a constant in the VBA code.
But there are other ways to do this.
Be sure to set the references for early binding as noted in the code:
Option Explicit
'Set References to:
' Microsoft Scripting Runtime
' Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim WS As Worksheet, RW As Long
Const sPath As String = "C:\Users\Ron\Desktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
WS.Columns.Clear
Set RE = New RegExp
With RE
.Global = True
.Pattern = "(?:\d(?=\S*[a-z])|[a-z](?=\S*\d))+\S*[a-z\d]"
.IgnoreCase = True
End With
For Each FI In FO.Files
If FI.Name Like "*.txt" Then
Set TS = FI.OpenAsTextStream(ForReading)
Do Until TS.AtEndOfStream
'Change .ReadLine to .ReadAll *might* make this run faster
' but would need to be tested.
Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
If MC.Count > 0 Then
For Each M In MC
RW = RW + 1
WS.Cells(RW, 1) = M
WS.Cells(RW, 2) = FI.Name
Next M
End If
Loop
End If
Next FI
End Sub
Upvotes: 2