Reputation: 1119
fooling around a bit with batch files, and wondering why there is a huge difference in time required to output a file in below scenarios :
Scenario 1 : Simple traverse through a log file,and for every row always taking the 5th token, unless it contains a filter string.
(for /f "tokens=5" %%a in (test.log) do @echo(%%a) | findstr /v "filter_1 filter_2" > !filter!.txt
This works great, going through a 50M file returns me a smaller 10Mb file in 10 seconds.
Scenario 2 : Do exactly the same, but add something in front and end of the token so I can output as an xml file rather than a text file. To do so I had to rebuild it a bit as below
echo ^<rows^> > test.xml
>>test.xml (
for /f "tokens=5" %%a in (
'findstr /v "filter1 filter2" test.log'
) do echo ^<r a="%%a"/^>
)
echo ^</rows^> >> test.xml
It works as expected for small files,but takes like forever for large files. Is there anyway to achieve what I want with scenario 2 but using the scenario 1 syntax, as that seems much more efficient.
Upvotes: 2
Views: 1007
Reputation: 130879
FOR /F always buffers the content of the IN() clause prior to beginning any iterations. This is true for both reading a file, as well as processing the output of a command. However, I believe there is some fundamental difference in how command output is buffered that makes it particularly slow with large output. Edit: MC ND has a nice explanation for why buffering of large output is so slow.
Most people are surprised to learn that sometimes the fastest batch solution is to write the command output to a temp file, and then use FOR /F to read the temp file. This will be fast as long as your disk drive is fast.
I believe the following will speed things considerably:
findstr /v "filter1 filter2" test.log >test.log.mod
>test.xml (
echo ^<rows^>
for /f "tokens=5" %%A in (test.log.mod) do echo ^<r a="%%A"/^>
echo ^</rows^>
)
del test.log.mod
Another option would be to add the XML wrapper to the left side of your original pipe, and then modify your FINDSTR filters appropriately. But the above solution may still be faster, depending on the number of lines that get filtered out.
(
echo ^<rows^>
for /f "tokens=5" %%A in (test.log) do echo ^<r a="%%A"/^>
echo ^</rows^>
) | findstr /v /c:"modifiedFilter_1" /c:"modifiedFilter_2" > test.xml
The FINDSTR will also need the /R
option if the filters are regular expressions.
But a far faster solution would be to use something like sed for Windows, or either of the JScript/Batch hybrid utilities, my REPL.BAT, or Aacini's FINDREPL.BAT.
Upvotes: 1