Reputation: 21
I would like to trim a text file storing around 240000 lines. I want to trim it every 1000 lines and save as a new text file and named it by order like a timestamp, e.g. abc_20141125110001. In every new file, the first line should be the same as the first line in source file.
Source.txt:
aaabbbb
1111111
2222222 (total 24000000 lines)
output1.txt (e.g.abc_20141125110001)
aaabbbb
1111111
output2.txt (e.g.abc_20141125110002)
aaabbbb
2222222
I have finished part of codes but seems not work. Please help to advise.
@echo off
Set /a file=100
Set /a line=1000
Set /a counter=0
Set firstline=This is line 1.
For /F "tokens=1*" %%a IN (abc.txt) Do (
set /a "remainder=%counter% %% %line%"
if %remainder% == "0" (
goto :createnew
)
else (
goto :append
)
)
goto :eof
:createnew
echo %firstline% >> Test%file%.txt
goto :append
:append
echo %%a >>Test%file%.txt
if %remainder% == "0" (
set /a file+=1
)
set /a counter+=1
:eof
Upvotes: 1
Views: 1336
Reputation: 70933
To be used as a base. The code to get the timestamp is very basic and locale dependant. In my case, with a date format dd/mm/yyyy
and time in hh:mm:ss,cc
it is the indicated one. If your configuration is different, change the order of the variables
@echo off
setlocal enableextensions enabledelayedexpansion
set "inputFile=test.txt"
for /f "tokens=1-10 delims=:.,/- " %%a in ("%date% %time%") do set "ts=%%c%%b%%a%%d%%e"
set "baseFileName=test_%ts: =0%"
set "lineLimit=1000"
set "fileNumber=10000000"
set "counter=%lineLimit%"
<"%inputFile%" set /p "header="
for /f "usebackq skip=1 delims=" %%a in ("%inputFile%") do (
set /a "counter+=1"
if !counter! gtr %lineLimit% (
set "counter=2"
set /a "fileNumber+=1"
set "outputFile=%baseFileName%!fileNumber:~-5!.txt"
echo !outputFile!
>> "!outputFile!" (
setlocal disabledelayedexpansion
echo(%header%
endlocal
)
)
>> "!outputFile!" (
setlocal disabledelayedexpansion
echo(%%a
endlocal
)
)
edited Following the optimization proposed by Aacini and dbenham, minimizing the number of file operations and removing goto
s and call
s, my own version
@echo off
setlocal enableextensions enabledelayedexpansion
set "inputFile=test.txt"
rem Get timestamp
for /f "tokens=1-10 delims=:.,/- " %%a in ("%date% %time%") do set "ts=%%c%%b%%a%%d%%e"
set "baseFileName=test_%ts: =0%"
rem Configure line limits
set "lineLimit=1000"
rem Get the number of lines in input file (except the header)
for /f %%a in ('find /c /v "" ^< "%inputFile%"') do set /a "totalLines=%%a-1"
rem Calculate needed files
set /a "lineLimit-=1"
set /a "totalFiles=%totalLines% / %lineLimit%"
set /a "remain=%totalLines% %% %lineLimit%"
if %remain% gtr 0 set /a "totalFiles+=1"
rem Prepare header variable
set "header="
rem Open input file for read
< "%inputFile%" (
rem If needed get the header record
if not defined header set /p "header="
rem For each of the files that need to be generated
for /l %%f in (1 1 %totalFiles%) do (
rem Prepare the file name
set /a "fileNumber=10000000 + %%f"
set "outputFile=%baseFileName%!fileNumber:~-5!.txt"
echo !outputFile!
rem Determine the number of lines that will be stored in this file
if %%f equ %totalFiles% (
set /a "counter=%totalLines% - ((%totalFiles%-1)*%lineLimit%)"
) else (
set "counter=%lineLimit%"
)
rem Open output file
> "!outputFile!" (
rem Put header in output file
echo(!header!
rem Write into output file all the needed lines from input file
for /l %%a in (1 1 !counter!) do (
set /p "line=" && (echo(!line!) || (echo()
)
)
)
)
Upvotes: 1
Reputation: 130849
I don't understand why you would put a time stamp at the end of each output file name. In my answer I have simply appended a file number (zero padded to width 4), instead of a timestamp. You can modify the answer to include timestamp if need be.
Manipulating large text files with pure batch is a pain - and relatively slow :-(
I believe the following is nearly the fastest possible solution using pure batch.
There are two significant limitations:
1) lines must be <=1021 byptes long.
2) trailing control characters will be stripped from each line.
But empty lines, exclamation points, poison characters - they all work fine :-)
The code breaks the source file into n files, where each output file has the header line, followed by up to 1000 lines. The output files are named based on the source file. For example, "test.txt" becomes "test_0001.txt", "test_0002.txt", etc.
@echo off
setlocal enableDelayedExpansion
set "src=result3.txt"
:: Redirect input to the source file
call :main "%src%" <"%src%"
exit /b
:main
::Get number of lines in file
for /f %%N in ('type "%src%"^|find /c /v ""') do set cnt=%%N
::Get first line to use as header for each file
set "header="
set /p "header="
set /a fileNum=1, lineNum=0
:loop
:: Exit if done
if !lineNum! geq !cnt! exit /b
:: establish zero padded numeric suffix
set "suffix=000!fileNum!"
set "suffix=!suffix:~-4!"
>"%~n1_!suffix!%~x1" (
echo(!header1!
for /l %%N in (1 1 1000) do if !lineNum! lss !cnt! (
set /a lineNum+=1
set "ln="
set /p "ln="
echo(!ln!
)
)
set /a fileNum+=1
goto :loop
Upvotes: 1
Reputation: 67216
The solution below should run fast because it minimizes the number of operations performed with each line and keeps the output file connected all the time via >
redirection (instead of >>
append one, that open and close the file with each line).
@echo off
setlocal EnableDelayedExpansion
for /F "tokens=1-3 delims=/" %%a in ("%date%") do set "datePart=%%c%%a%%b"
< abc.txt call :SplitFile
goto :EOF
:SplitFile
rem Get the first line
set /P "firstline="
rem Place the next 1000 lines inside a new file
:nextFile
for /F "tokens=1-3 delims=:." %%a in ("%time%") do set timePart=%%a%%b%%c
set "timePart=%timePart: =0%"
echo Creating file: abc_%datePart%%timePart%.txt
(
echo %firstLine%
for /L %%i in (1,1,1000) do (
set "line="
set /P "line="
if defined line echo !line!
)
) > abc_%datePart%%timePart%.txt
if defined line goto nextFile
exit /B
This solution eliminate empty lines and may terminate process if an empty line appear at end of a new generated file. If the input file may have empty lines, this method could be modified defining a maximum number of possible empty lines.
Upvotes: 0