Ikarian
Ikarian

Reputation: 51

Importing and parsing multiple text files into single XLS?

I have been googling at this all day, but as I have zero VBA skills, I can't find anything that works properly and that I can adapt to my needs.

I have a folder with about 4500 text files with the outputs of a hardware test run on computers we process. The text file has a lot of info I don't need. What I do need is to import all these files into a single spreadsheet and then parse them down to 3 desired fields each - and in such a way that is easily readable.

I have had a hard time even finding a VBA macro that import the text files without an error I can't get past. I did find the following one, however it is set up to look for a second column in the file, which in my case screws up the inputted data and throws around values all over the place (think minor formatting issue times 4500 records).

Here is the macro:

Sub test() 
  Dim myDir As String, fn As String, ff As Integer, txt As String 
  Dim delim As String, n As Long, b(), flg As Boolean, x 
  myDir = "c:\test" '<- change to actual folder path
  delim = vbTab '<- delimiter (assuming Tab delimited)
  Redim b(1 To Rows.Count, 1 To 2) 
  fn = Dir(myDir & "\*.txt") 
  Do While fn <> "" 
    ff = FreeFile 
    Open myDir & "\" & fn For Input As #ff 
    Do While Not EOF(ff) 
      Line Input #ff, txt 
      x = Split(txt, delim) 
      n = n + 1 
      If Not flg Then b(n,2) = fn 
      If UBound(x) > 0 Then 
        b(n,1) = x(1) 
      End If 
      flg = True 
    Loop 
    Close #ff 
    flg = False 
    fn = Dir() 
  Loop 
  ThisWorkbooks.Sheets(1).Range("a1").Resize(n,2).Value = b 
End Sub 

And here is a sample of one of the text files:

ILPN Number: I01128204
MAC Address: E0DB55820F85

Hardware:

CPU:Intel(R) Core(TM) i3-2370M CPU @ 2.40GHz
MOTHERBOARD:Dell Inc. - 0G8TPV - A02
BIOS VERSION/DATE:A02 - 08/15/2012
RAM DETECTED:4096 MB (BANK 0: 0/DDR3/1333 - BANK 2: 0/Unknown/0)
MANUFACTURER:Dell Inc.
PRODUCT:Inspiron 3520
SERIAL:B1JW9V1
NIC SPEED/NAME:100 Mbps - Realtek PCIe FE Family Controller, V:8.1
GRAPHIC RES:1366 x 768 (32 bits)
OPTIC DRIVE:HL-DT-ST, DVD+-RW GT80N   , A103 (E:)
FIXED DISK:ST500LM012 HN-M500MBB - 465.76 GB (C: - GPT)
 - 0.00 GB (D: - MBR)
WINPE:Microsoft Windows 8 Ultimate Edition, 32-bit (build 9200)

Module 126: Result: Ok

Initializing module version='1.0.0.17' with ''
Module Initialization done.
Starting module with: Param1=0x00010010 - Param2=0xFFFF0000
Module start: 14/05/2013 10:38:40
Set language module to: en-US
  Found disk: 0 - 'ST500LM012 HN-M500MBB'
All PHYSICALDRIVE will be used as valid target...
Module started properly.
Cleaning element 'Disk ID: 0 - Model: ST500LM012 HN-M500MBB - Size: 465.762 GB' with       algorithm 'Basic (random)'...
Clean process Successful
Process took 6099 sec to clean 476940.02MB ~ 78.20 MB/Sec
Closing module with code '1'...
Closing module done with result '0'
Releasing module...
Releasing module done at: 14/05/2013 12:20:19

So, from all that, I need to pull out the "ILPN number", The MAC address, and the line that says "Module 126: Result OK". Everything else can go away. It would be nice if I could get it sorted with columns for the three values, and then each record in its own row.

This is probably a two step process. Either or both solutions would be tremendously helpful. Thank you!

Upvotes: 0

Views: 11652

Answers (2)

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

Restating the task/problem:

Given a lot of structured text files in a folder, extract from each file three (or a multiple of three?) data items (ILPN Number, MAC Address, Test Result) and put them as rows in a .txt/.csv file that can be red by/imported into Excel.

Idea:

Loop over the files, read each into memory, parse/extract the data triple(s) using a RegExp, write them into the output file; use Excel to open the output file, do further manipulation manually.

Plan:

Use a VBScript from the command line to keep things simple.

[If that look good to you and you can answer the question "Does each input file contain one or more info triple(s)?", I'm willing to add some proof of concept code to this concept.]

While you wait:

Given your additional info, I believe that @Ansgar's assumptions about your input data are mostly correct, and I fixed two typos in the code. So give his script (+1) a try by starting a "Command Prompt", creating/changing to a suitable directory, copy the code into a file - say ansgar.vbs -, adapt the folder/output file specs to your need, and run it via cscript ansgar.vbs.

Small changes - like putting the whole "Module" line into the 3rd item

If Left(line, 10) = "Module 126" Then
   ws.Cells(row, 3).Value = Trim(Split(line, ":")(2))
==>
If Left(line, 7) = "Module " Then
   ws.Cells(row, 3).Value = line

are not to hard. If you are lucky, your problem is solved.

Update (wrt comment/subscript):

I think the

ws.Cells(row, 3).Value = Trim(Split(line, ":")(2))

line is to blame. The Split() on ":" should split input lines like

Module 126: Result: Ok

into an array of three elements "Module 126", " Result", and " Ok" numbered/indexed/subscripted from 0 to 2. If Split() does not get two : separators in the line, the resulting array is smaller and the access to element #2 will fail.

You'll have to assign the return value from Split() to a variable, check the size of the array (UBound), look at the faulty lines, and decide whether to ignore them or to change the guarding If condition.

Upvotes: 0

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200203

A VBScript for reading the three lines from multiple input files into an Excel sheet could look like this:

Set xl = CreateObject("Excel.Application")
xl.Visible = True

Set wb = xl.Workbooks.Add
Set ws = wb.Sheets(1)

row = 1
ws.Cells(row, 1).Value = "ILPN"
ws.Cells(row, 2).Value = "MAC Address"
ws.Cells(row, 3).Value = "Module 126"

Set fso = CreateObject("Scripting.FileSystemObject")
For Each f In fso.GetFolder("C:\your\folder").Files
  If LCase(fso.GetExtensionName(f.Name)) = "txt" Then
    row = row + 1
    Set stream = f.OpenAsTextStream
    ws.Cells(row, 1).Value = Trim(Split(stream.ReadLine, ":")(1))
    ws.Cells(row, 2).Value = Trim(Split(stream.ReadLine, ":")(1))
    Do Until stream.AtEndOfStream
      line = stream.ReadLine
      If Left(line, 10) = "Module 126" Then
        ws.Cells(row, 3).Value = Trim(Split(line, ":")(2))
        Exit Do
      End If
    Loop
    stream.Close
  End If
Next

wb.SaveAs "C:\some\folder\output.xls", -4143, , , , False
wb.Close
xl.Quit

As a VBA macro the following should work:

Sub LoadDataFromFiles
  row = 1
  ActiveSheet.Cells(row, 1).Value = "ILPN"
  ActiveSheet.Cells(row, 2).Value = "MAC Address"
  ActiveSheet.Cells(row, 3).Value = "Module 126"

  Set fso = CreateObject("Scripting.FileSystemObject")
  For Each f In fso.GetFolder("C:\your\folder").Files
    If LCase(fso.GetExtensionName(f.Name)) = "txt" Then
      row = row + 1
      Set stream = f.OpenAsTextStream
      ActiveSheet.Cells(row, 1).Value = Trim(Split(stream.ReadLine, ":")(1))
      ActiveSheet.Cells(row, 2).Value = Trim(Split(stream.ReadLine, ":")(1))
      Do Until stream.AtEndOfStream
        line = stream.ReadLine
        If Left(line, 10) = "Module 126" Then
          ActiveSheet.Cells(row, 3).Value = Trim(Split(line, ":")(2))
          Exit Do
        End If
      Loop
      stream.Close
    End If
  Next

  ActiveWorkbook.Save
End Sub

Upvotes: 1

Related Questions