Singularity20XX
Singularity20XX

Reputation: 321

Parsing text files from Excel with string manipulation

Below is an example of a parsing program. It takes text from a text file and parses the data using string manipulation, and a couple loops:

    Dim myFile As String
    Dim text As String
    Dim textline As String
    Dim cstAct as integer
    Dim actOpe as integer
    Dim cusNam as integer
    Dim act as integer
    Dim reg as integer

    myFile = "put file patch to text file here"
    myFile = Application.GetOpenFilename()

Here is the do loop that I would like to pause once it reaches line 3 (the next account record)

    Do Until EOF(1)
      Line Input #1, textline
      text = text & textline
    Loop

    cusAct = InStr(text, "ACCOUNT ")
    actOpe = InStr(text, "ACCOUNT OPEN:")
    reg = InStr(text, "REGION:")
    cusNam = InStr(text, "CUSTOMER NAME:")

This is the for...loop I wish to execute once the do...loop stops or 'pauses once it reaches the next record

    For i = 2 To ThisWorkbook.Worksheets("b2").Range("a65536").End(xlUp).Row
      ThisWorkbook.Worksheets("name").Range("a" & i).Value = Mid(text, cstAct + 6, 9)
      ThisWorkbook.Worksheets("name").Range("b" & i).Value = Mid(text, actOpe + 13, 27)
      ThisWorkbook.Worksheets("name").Range("c" & i).Value = Mid(text, reg + 6, 9)
      ThisWorkbook.Worksheets("name").Range("d" & i).Value = Mid(text, cusNam  + 20, 19)

This is where I want to resume the 'do...loop' so that new the sub strings (ie 987654321 would be the new substring that results from Mid(text, cstAct + 6, 9)) of its respective parent string (ie ACCOUNT) refresh so to speak otherwise, lines 1 and 2 will just loop over and over again.

    next i

Below is an example of the sample text file:

    ACCOUNT ABCDEF12                                                                 
    ACCOUNT OPEN:     05/10/15              ACT TYPE: PREMIUM          
    CUSTOMER NAME:    JOHN B. SMITH         CSA REP:  154983                   
    CUSTOMER ADDRESS: 123 SOMEWHERE DRIVE   SOMETHING HERE:                   
    LAST ORDER:       06/24/2011             COUNTRY CODE: UNITED STATES      
    INVOICE #:        123456789             STATE CODE:    CALIFORNIA         
    LAST MAINTENANCE: 01/02/15               COUNTY CODE:  UNCODED            
    SOME INDICATOR:   NO   COMPLAINTS: NO   IPM IND:       DATAPREP/PERF4     
    SOME INDICATOR:   NO STATUS:  NONE      AUTO RENEW:    YES                
    SOMETHING HERE:   NO                             
    SOMETHING HERE:          ABC IND:       
    SOMETHING HERE:   2    ABC ASSET NO:  T                                           
    ACCOUNT ZXYFDG13                                                                 
    ACCOUNT OPEN:     05/10/15              ACT TYPE: PREMIUM          
    CUSTOMER NAME:    JANE B. SMITH         CSA REP:  154983                   
    CUSTOMER ADDRESS: 123 SOMEWHERE DRIVE   SOMETHING HERE:                   
    LAST ORDER:       06/24/2011             COUNTRY CODE: UNITED STATES      
    INVOICE #:        123456789             STATE CODE:    CALIFORNIA         
    LAST MAINTENANCE: 01/02/15               COUNTY CODE:  UNCODED            
    SOME INDICATOR:   NO   COMPLAINTS: NO   IPM IND:       DATAPREP/PERF4     
    SOME INDICATOR:   NO STATUS:  NONE      AUTO RENEW:    YES                
    SOMETHING HERE:   NO                             
    SOMETHING HERE:          ABC IND:  NO     
    SOMETHING HERE:   2    REGION:  NE                       

without adjusting the above code structure, output in excel will look like this:

            A           B           C     D
    ROW 1   123456789   00/00/0000  NY    JON SMITH
    ROW 2   123456789   00/00/0000  NY    JON SMITH

I am trying to get it to look like this:

            A           B           C     D
    ROW 1   123456789   00/00/0000  NY    JON SMITH
    ROW 2   987654321   00/00/0000  FL    JANE SMITH

Any thoughts on how to best do this?

Upvotes: 2

Views: 4613

Answers (1)

joehanna
joehanna

Reputation: 1489

If you know the literal structure of each "record type", then you can declare them as VBA User Defined Type structures for reading (and writing). Further, it looks like you can simplify your efforts with a slightly different code design and improve your error handling.

Consider how I would approach this problem using UDFs, which makes the code so much more readable and therefore, maintainable:-

'Always set this to ensure you have all variables declared
Option Explicit

'User Defined Types for each record format
Private Type AccountInfoType
  OpenText As String * 18           'Absorb all text and prefixes up to data
  OpenDate As String * 8           'Contains the data
  AccTypeText As String * 24        'Absorb all text and prefixes up to data
  AccType As String * 7             'Contains the data
  'Add additional fields here
  CRLF As String * 2                'CR/LF character
End Type

Private Type CustomerNameType
  NameText As String * 18           'Absorb all text and prefixes up to data
  Name As String * 20               'Contains the data
  CsaRepText As String * 12         'Absorb all text and prefixes up to data
  CsaRep As String * 6              'Contains the data
  'Add additional fields here
  CRLF As String * 2                'CR/LF character
End Type

Private Type AddressType
  AddressText As String * 18        'Absorb all text and prefixes up to data
  AddressData As String * 20        'Contains the data
  SomethingHereText As String * 17  'Absorb remaining text
  'Add additional fields here
  CRLF As String * 2                'CR/LF character
End Type

Private Type LastOrderType
  LastOrderText As String * 18      'Absorb all text and prefixes up to data
  LastOrderDate As String * 10      'Contains the data
  CountryText As String * 27        'Absorb all text and prefixes up to data
  Country As String * 13            'Contains the data
  'Add additional fields here
  CRLF As String * 2                'CR/LF character
End Type

Private Type InvoiceType
  InvoiceText As String * 18        'Absorb all text and prefixes up to data
  InvoiceNumber As String * 9       'Contains the data
  StateText As String * 28          'Absorb all text and prefixes up to data
  State As String * 10              'Contains the data
  'Add additional fields here
  CRLF As String * 2                'CR/LF character
End Type

Sub ParseFile()

  Dim wb As Workbook
  Dim ws As Worksheet

  Dim row As Long

  Dim dataRecord As String

  Dim accountNumber As String
  Dim accountInfo As AccountInfoType
  Dim customerName As CustomerNameType
  Dim address As AddressType
  Dim lastOrder As LastOrderType
  Dim invoice As InvoiceType

  Dim myFile As Variant


  'Consider using proper error handling
  On Error GoTo ParseFileZ

  myFile = Application.GetOpenFilename()

  If myFile = False Then
    'Not a fan of GoTo but better than running the whole method inside if/then block
    GoTo ParseFileX
  End If

  'I started with a new workbook. Change this to open an exsting workbook if desired
  Set wb = Application.Workbooks.Add

  'Set this handle to your desired worksheet
  Set ws = wb.Worksheets(1)

  'Set up column headers here. I chose row 3 to allow for a heading in row 1. Choose your own...
  ws.Range("A3").Value = "Acc Number"
  ws.Range("B3").Value = "Acc Opened"
  ws.Range("C3").Value = "Region"
  ws.Range("D3").Value = "Name"

  'Base output row in the worksheet
  row = 3

  'Open the file in binary mode so that you can use User Defined Types to read each record
  Open CStr(myFile) For Binary As #1

  While Not EOF(1)

    'Read next record
    Input #1, dataRecord

    'Find the first record of the next account - otherwise, skip until you get one
    If Left(dataRecord, 7) = "ACCOUNT" And Len(dataRecord) = 16 Then

      'Found the Account Number record. This is the start of the next account
      accountNumber = Mid(dataRecord, 9, 8)

      Get #1, , accountInfo  'Read the Account info record
      Get #1, , customerName 'Read the Customer Name record
      Get #1, , address      'Read the Address record
      Get #1, , lastOrder    'Read the Last Order record
      Get #1, , invoice      'read the Invoice record

      'Ignore the remaining records unless you want to get more data. The "Read Next Record" loop will skip them

      'Get the next row number on the output worksheet to write values to
      row = row + 1

      'Assign the values from the various records
      ws.Cells(row, 1).Value = Trim(accountNumber)
      ws.Cells(row, 2).Value = Trim(accountInfo.OpenDate)
      ws.Cells(row, 3).Value = Trim(invoice.State)  '(you talk about "region" but no region in data sample)
      ws.Cells(row, 4).Value = Trim(customerName.Name)
      'Add more cells for additional records you want to extra fields from here

    End If

  Wend

  'We're finished. Close the file
  Close #1

  'Resize the cells for readibilty
  ws.Cells.EntireColumn.AutoFit

ParseFileX:

  'Disable error handling
  On Error GoTo 0

  'Be a good memory citizen
  Set ws = Nothing
  Set wb = Nothing

  Exit Sub

ParseFileZ:
  MsgBox Err.Number & " - " & Err.Description, "Error occurred"
  Resume ParseFileX

End Sub

Upvotes: 2

Related Questions