Find corrupt lines in textfiles and write them behind the line above

I have around 400 textfiles with circa 41000 corrupt lines.

I am searching for an option (VBA maybe?) which searches for these corrupt lines and basically executes a backspace, so that the corrupt lines are written behind the line before, because the corruption is caused by an unwanted wordwrap. The indicator for corrupt lines is that they don't start with the letters TEQ.

Has anyone any idea how and where to build a script like that? Search and replace does not work since i cant but a backspace in the replace field obviously. Thanks in advance!

EDIT:

An example of a corrupted line:

TEQ;231232;OFNENJD;29840389;TPOS;
TEQ;54111232;O2D;29829;
TPOS;

Line 3 is the corrupted one since it belongs to line 2 but there was a wordwrap. I need to execute a backspace to get it back behind line 2. That's what i'd like to have automated.

Upvotes: 0

Views: 52

Answers (1)

user4039065
user4039065

Reputation:

To isolate the bad end-of-lines, first convert the good end-of-lines to an abstract. You can then remove the vbCrLF or vbLf which will have the effect of backspacing them away. The last step would be to restore the good end-of-lines by reversing the abstract.

dim str as string
'use your favorite method to read the TXT file into the str variable
str = Replace(str, chr(59) & vbCrLf & "TEQ;", chrw(8203))  'convert good eol to unicode zero-length space
str = Replace(str, vbLf, vbNullString)   'remove bad eols
str = Replace(str, chrw(8203), chr(59) & vbCrLf & "TEQ;")  'revert back to good eol
'write the str back to the TXT file

It wouldn't be a bad idea to throw a few of the .TXT files into a hex editor to determine whether the bad end-of-lines are created with vbCrLf (Chr(13) & Chr(10)) or just vbLf (Chr(10)). Same with the good end-of-lines although I suspect the good ones will be vbCrLF and the bad ones just vbLf.

The following Sub procedure requires that you go into the VBE's Tools ► References and add Microsoft Scripting Runtime to the project.

Sub fix_TEQ_text()
    Dim str As String, fp As String, fn As String
    Dim fso As New FileSystemObject, ts As TextStream

    fp = Environ("TEMP")
    fn = Dir(fp & Chr(92) & "TEQ*.txt", vbNormal)

    Do While CBool(Len(fn))
        If Not CBool(InStr(1, fn, "_fixed", vbTextCompare)) Then
            Set ts = fso.OpenTextFile(fp & Chr(92) & fn, ForReading)
            str = ts.ReadAll
            ts.Close

            str = Replace(str, Chr(59) & vbCrLf & "TEQ;", ChrW(8203))  'convert good eol to unicode zero-length space
            str = Replace(str, vbLf, vbNullString)   'remove bad eols
            str = Replace(str, ChrW(8203), Chr(59) & vbCrLf & "TEQ;")  'revert back to good eol

            Set ts = fso.CreateTextFile(fp & Chr(92) & Replace(fn, ".txt", "_fixed.txt"), True)
            ts.Write str
            ts.Close

        End If
        fn = Dir
    Loop
End Sub

You will want to change the file path (e.g. fp) and the file mask (currently "TEQ*.txt" which matched my sample TXT files).

Upvotes: 1

Related Questions