Reputation: 73
Every day we get a flat text file. Some days there are lines in the file that need to be deleted before it can be processed. These lines can appear in different places, but always start with the characters 6999 or 7999. We would like to run a script that will delete these particular lines. However, and this is way beyond me, any where there is a line that starts 6999 there will be a line immediately before it that starts 5442 that also needs to be deleted, but only if it appears immediately before the 6999 line.
We are a Windows shop and would run this script as part of a simple batch file in Windows. We do not use Unix or Linux nor desire to.
The file name extension reflects the date. today's file is file.100621, tomorrow's will be file.100622. I am having trouble with this aspect, as it seems vbscript does not like file.*
Here is a sample of the text file:
4006006602 03334060000100580
40060066039 0334070000100580
700600000011571006210060001255863
544264287250111000025000000000040008000801
6999001000000000000000000000000000000000000000000000000000
6999001000000000000000000000000000000000000000000000000000
6999001000000000000000000000000000000000000000000000000000
799900000011571006210030000000000
8007000000115710062102530054008920
We'd like to remove 5 lines in this file (the 5442 line, the three 6999 lines, and the 7999 line).
Here is a sample of the script that I found on this site, have modified and had some success, but don't know the way to delete the lines (only know how to replace data in the line). I realize this will either need major modifications or need to be thrown out altogether, but I post this to provide an idea of what I think we are looking for. I put this in a directory with the cscript.exe and call it from a simple batch file:
Set objFS = CreateObject("Scripting.FileSystemObject")
strFile = "c:\temp\file.100621"
Set objFile = objFS.OpenTextFile(strFile)
Do Until objFile.AtEndOfStream
strLine = objFile.ReadLine
If InStr(strLine,"6999")> 0 Then
strLine = Replace(strLine,"6999","delete line")
End If
WScript.Echo strLine
Loop
Which gets me this:
40060066039 0334070000100580
700600000011571006210060001255863
544264287250111000025000000000040008000801
delete line001000000000000000000000000000000000000000000000000000
delete line001000000000000000000000000000000000000000000000000000
delete line001000000000000000000000000000000000000000000000000000
799900000011571006210030000000000
8007000000115710062102530054008920
Close! just need to delete lines instead of write "delete line". So here are my specific needs based on what I know:
Upvotes: 7
Views: 53704
Reputation: 7
The easiest way would be open the file in Notepad++. Use the line editing tools. Or use regex (regular expressions) within Notepad++ to get even more customization easily.
Upvotes: -2
Reputation: 73
OK, here is the final script as awesomely assembled by Tester101. This script removes lines that are not needed as outlined above. It also deals with the line feeds that are at the end of every line (unbeknown to me)
Select Case Wscript.Arguments.Count
case 1:
strInput = GetFile(WScript.Arguments(0))
RemoveUnwantedLines strInput, strInput
RemoveBlankLines strInput
case 2:
strInput = GetFile(WScript.Arguments(0))
strOutput = Wscript.Arguments(1)
RemoveUnwantedLines strInput, strOutput
RemoveBlankLines strOutput
End Select
Function GetFile(strDirectory)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(strDirectory)
dateLastModified = Null
strFile = ""
For Each objFile in objFolder.Files
If IsNull(dateLastModified) Then
dateLastModified = objFile.DateLastModified
strFile = objFile.Path
ElseIf dateLastModified < objFile.DateLastModified Then
dateLastModified = objFile.DateLastModified
strFile = objFile.Path
End If
Next
GetFile = strFile
End Function
Sub RemoveUnwantedLines(strInputFile, strOutputFile)
'Open the file for reading.
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1)
'Read the entire file into memory.
strFileText = objFile.ReadAll
'Close the file.
objFile.Close
'Split the file at the new line character. *Use the Line Feed character (Char(10))
arrFileText = Split(strFileText,Chr(10))
'Open the file for writing.
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strOutputFile,2,true)
'Loop through the array of lines looking for lines to keep.
For i = LBound(arrFileText) to UBound(arrFileText)
'If the line is not blank process it.
If arrFileText(i) <> "" Then
'If the line starts "5442", see if the next line is "6999".
If Left(arrFileText(i),4) = "5442" Then
'Make sure the next line exists (Don't want an out of bounds exception).
If i + 1 <= UBound(arrFileText)Then
'If the next line is not "6999"
If Left(arrFileText(i + 1), 4) <> "6999" Then
'Write the "5442" line to the file.
objFile.WriteLine(arrFileText(i))
End If
Else
'If the next line does not exist, write the "5442" line to the file (without a new line).
objFile.WriteLine(arrFileText(i))
End If
'If the line does not start with "6999" and the line does not start with "7999".
Elseif Left(arrFileText(i),4) <> "6999" AND Left(arrFileText(i),4) <> "7999" Then
'Write the line to the file.
objFile.WriteLine(arrFileText(i))
End If
End If
Next
'Close the file.
objFile.Close
Set objFile = Nothing
End Sub
Sub RemoveBlankLines(strInputFile)
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1)
'Read the entire file into memory.
strFileText = objFile.ReadAll
'Close the file.
objFile.Close
'Split the file at the new line character.
arrFileText = Split(strFileText,VbNewLine)
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,2,true)
'Loop through the array of lines looking for lines to keep.
For i = LBound(arrFileText) to UBound(arrFileText)
'If the line is not blank.
if arrFileText(i) <> "" Then
'If there is another element.
if i + 1 <= UBound(arrFileText) Then
'If the next element is not blank.
if arrFileText(i + 1) <> "" Then
'Write the line to the file.
objFile.WriteLine(arrFileText(i))
Else
'Write the line to the file (Without a blank line).
objFile.Write(arrFileText(i))
End If
Else
'Write the line to the file (Without a blank line).
objFile.Write(arrFileText(i))
End If
End If
Next
'Close the file.
objFile.Close
Set objFile = Nothing
End Sub
Upvotes: 0
Reputation: 8172
I made some changes to try to eliminate the blank line, I also added a function to loop through the output file and remove any blank lines. Hope this one works.
Select Case Wscript.Arguments.Count
case 1:
strInput = GetFile(WScript.Arguments(0))
RemoveUnwantedLines strInput, strInput
RemoveBlankLines strInput
case 2:
strInput = GetFile(WScript.Arguments(0))
strOutput = Wscript.Arguments(1)
RemoveUnwantedLines strInput, strOutput
RemoveBlankLines strOutput
End Select
Function GetFile(strDirectory)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(strDirectory)
dateLastModified = Null
strFile = ""
For Each objFile in objFolder.Files
If IsNull(dateLastModified) Then
dateLastModified = objFile.DateLastModified
strFile = objFile.Path
ElseIf dateLastModified < objFile.DateLastModified Then
dateLastModified = objFile.DateLastModified
strFile = objFile.Path
End If
Next
GetFile = strFile
End Function
Sub RemoveUnwantedLines(strInputFile, strOutputFile)
'Open the file for reading.
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1)
'Read the entire file into memory.
strFileText = objFile.ReadAll
'Close the file.
objFile.Close
'Split the file at the new line character. *Use the Line Feed character (Char(10))
arrFileText = Split(strFileText,Chr(10))
'Open the file for writing.
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strOutputFile,2,true)
'Loop through the array of lines looking for lines to keep.
For i = LBound(arrFileText) to UBound(arrFileText)
'If the line is not blank process it.
If arrFileText(i) <> "" Then
'If the line starts "5442", see if the next line is "6999".
If Left(arrFileText(i),4) = "5442" Then
'Make sure the next line exists (Don't want an out of bounds exception).
If i + 1 <= UBound(arrFileText)Then
'If the next line is not "6999"
If Left(arrFileText(i + 1), 4) <> "6999" Then
'Write the "5442" line to the file.
objFile.WriteLine(arrFileText(i))
End If
Else
'If the next line does not exist, write the "5442" line to the file (without a new line).
objFile.WriteLine(arrFileText(i))
End If
'If the line does not start with "6999" and the line does not start with "7999".
Elseif Left(arrFileText(i),4) <> "6999" AND Left(arrFileText(i),4) <> "7999" Then
'Write the line to the file.
objFile.WriteLine(arrFileText(i))
End If
End If
Next
'Close the file.
objFile.Close
Set objFile = Nothing
End Sub
Sub RemoveBlankLines(strInputFile)
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,1)
'Read the entire file into memory.
strFileText = objFile.ReadAll
'Close the file.
objFile.Close
'Split the file at the new line character.
arrFileText = Split(strFileText,VbNewLine)
Set objFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(strInputFile,2,true)
'Loop through the array of lines looking for lines to keep.
For i = LBound(arrFileText) to UBound(arrFileText)
'If the line is not blank.
if arrFileText(i) <> "" Then
'If there is another element.
if i + 1 <= UBound(arrFileText) Then
'If the next element is not blank.
if arrFileText(i + 1) <> "" Then
'Write the line to the file.
objFile.WriteLine(arrFileText(i))
Else
'Write the line to the file (Without a blank line).
objFile.Write(arrFileText(i))
End If
Else
'Write the line to the file (Without a blank line).
objFile.Write(arrFileText(i))
End If
End If
Next
'Close the file.
objFile.Close
Set objFile = Nothing
End Sub
To use it call it from the command line in one of two ways.
RemoveUnwantedLines "C:\TestDirectory\" "C:\Output.txt"
or
RemoveUnwantedLines "C:\TestDirectory\"
Upvotes: 7
Reputation: 5004
This would be my pseudo algoritme for solving this issue:
(I will rather teach you my thoughts of how I would solve it, than provide the code itself)
Make the file used as a parametre (so it can be flexible) or make a "spooler" folder which this program checks for new content when run, like an "Inbox" for mail. Then you also need an "Outbox". This way you can process files as they come, not knowing what they are named and move them to the "Outbox" when processed.
Make a simple "config" file for this program too. Each line could represent the "filter" and later you could add actions to the lines too, if needed.
7999 delete
6999 delete
5442 delete
as in [pattern] [action]
Now after reading the config into an array of "keys", then check the "Inbox" for files. For each file, process it with the key-set.
Processing file "XXXXXXXXX.log" (or whatever name) Load all the lines, if there arent too many or readline to grab a single (depending on performance and memory usage)
For each line, take the first 4 letters from the string...
Now we will need a line to parse:
sLine = left(readline(input filestream), 4)
as we only need the first 4 chars to decide if we need to keep it.
If this "sLine" (string) is in our array of filter/patterns, then we have a match match... do what action we have configured (In your current setup - delete = ignore line).
6a. If ignore, then go on to next line in text file, goto #7
6b. If no match in pattern array, then we have a line to keep. Write this into the OUTPUT stream.
If more lines, NEXT (goto #5)
Close input and output file.
Delete/move input file from Inbox (perhaps to backup?)
If more files in directory [inbox] then parse next... go to #4
This isnt just pure VBSCRIPT but ann algorithm idea for any language...
I hope you can see my idea in it, else you just comment it and I will try to elaborate on it. Hope I have made you a great answer.
Upvotes: 0
Reputation: 55009
I think this would work (but I'm not that good at VBS so no promises):
Set objFS = CreateObject("Scripting.FileSystemObject")
strFile = "c:\temp\file.100621"
Set objFile = objFS.OpenTextFile(strFile)
Dim cachedLine
Do Until objFile.AtEndOfStream
strLine = objFile.ReadLine
If Len(cachedLine) > 0 And InStr(strLine,"6999") = 1 Then
WScript.Echo cachedLine
End If
cachedLine = ""
If InStr(strLine,"5442") = 1 Then
cachedLine = strLine
Else
If InStr(strLine,"6999") = 1 Or InStr(strLine,"7999") = 1 Then
' do nothing
Else
WScript.Echo strLine
End If
End If
Loop
Note that I think you were checking if the lines contained the numbers anywhere but you said that the rule was if they started with the numbers, that's why I do <> 1
rather than > 0
.
Upvotes: 2