user2164919
user2164919

Reputation: 21

Trim text file with multiple lines using windows batch

I would like to trim a text file storing around 240000 lines. I want to trim it every 1000 lines and save as a new text file and named it by order like a timestamp, e.g. abc_20141125110001. In every new file, the first line should be the same as the first line in source file.

Source.txt:
aaabbbb
1111111
2222222 (total 24000000 lines)

output1.txt (e.g.abc_20141125110001)
aaabbbb
1111111 

output2.txt (e.g.abc_20141125110002)
aaabbbb
2222222

I have finished part of codes but seems not work. Please help to advise.

@echo off

Set /a file=100
Set /a line=1000
Set /a counter=0
Set firstline=This is line 1.


For /F "tokens=1*" %%a IN (abc.txt) Do (
    set /a "remainder=%counter% %% %line%"
    if %remainder% == "0" (
        goto :createnew
    )
    else (
        goto :append
    )
)

goto :eof

:createnew
echo %firstline% >> Test%file%.txt
goto :append

:append
echo %%a >>Test%file%.txt
if %remainder% == "0" (
    set /a file+=1
)
set /a counter+=1

:eof

Upvotes: 1

Views: 1336

Answers (3)

MC ND
MC ND

Reputation: 70933

To be used as a base. The code to get the timestamp is very basic and locale dependant. In my case, with a date format dd/mm/yyyy and time in hh:mm:ss,cc it is the indicated one. If your configuration is different, change the order of the variables

@echo off
    setlocal enableextensions enabledelayedexpansion

    set "inputFile=test.txt"

    for /f "tokens=1-10 delims=:.,/- " %%a in ("%date% %time%") do set "ts=%%c%%b%%a%%d%%e"
    set "baseFileName=test_%ts: =0%"

    set "lineLimit=1000"
    set "fileNumber=10000000"
    set "counter=%lineLimit%"

    <"%inputFile%" set /p "header="

    for /f "usebackq skip=1 delims=" %%a in ("%inputFile%") do (
        set /a "counter+=1"
        if !counter! gtr %lineLimit% (
            set "counter=2"
            set /a "fileNumber+=1"
            set "outputFile=%baseFileName%!fileNumber:~-5!.txt"
            echo !outputFile!
            >> "!outputFile!" (
                setlocal disabledelayedexpansion
                echo(%header%
                endlocal
            )
        )
        >> "!outputFile!" (
            setlocal disabledelayedexpansion
            echo(%%a
            endlocal
        )
    )

edited Following the optimization proposed by Aacini and dbenham, minimizing the number of file operations and removing gotos and calls, my own version

@echo off
    setlocal enableextensions enabledelayedexpansion

    set "inputFile=test.txt"

    rem Get timestamp
    for /f "tokens=1-10 delims=:.,/- " %%a in ("%date% %time%") do set "ts=%%c%%b%%a%%d%%e"
    set "baseFileName=test_%ts: =0%"

    rem Configure line limits
    set "lineLimit=1000"

    rem Get the number of lines in input file (except the header)
    for /f %%a in ('find /c /v "" ^< "%inputFile%"') do set /a "totalLines=%%a-1"

    rem Calculate needed files 
    set /a "lineLimit-=1"
    set /a "totalFiles=%totalLines% / %lineLimit%"
    set /a "remain=%totalLines% %% %lineLimit%"
    if %remain% gtr 0 set /a "totalFiles+=1"

    rem Prepare header variable
    set "header="

    rem Open input file for read
    < "%inputFile%" (

        rem If needed get the header record
        if not defined header set /p "header="

        rem For each of the files that need to be generated
        for /l %%f in (1 1 %totalFiles%) do (

            rem Prepare the file name
            set /a "fileNumber=10000000 + %%f"
            set "outputFile=%baseFileName%!fileNumber:~-5!.txt"
            echo !outputFile!

            rem Determine the number of lines that will be stored in this file
            if %%f equ %totalFiles% (
                set /a "counter=%totalLines% - ((%totalFiles%-1)*%lineLimit%)"
            ) else (
                set "counter=%lineLimit%"
            )

            rem Open output file
            > "!outputFile!" (
                rem Put header in output file
                echo(!header!

                rem Write into output file all the needed lines from input file
                for /l %%a in (1 1 !counter!) do (
                    set /p "line=" && (echo(!line!) || (echo()
                )
            )
        )
    )

Upvotes: 1

dbenham
dbenham

Reputation: 130849

I don't understand why you would put a time stamp at the end of each output file name. In my answer I have simply appended a file number (zero padded to width 4), instead of a timestamp. You can modify the answer to include timestamp if need be.

Manipulating large text files with pure batch is a pain - and relatively slow :-(

I believe the following is nearly the fastest possible solution using pure batch.

There are two significant limitations:

1) lines must be <=1021 byptes long.
2) trailing control characters will be stripped from each line.

But empty lines, exclamation points, poison characters - they all work fine :-)

The code breaks the source file into n files, where each output file has the header line, followed by up to 1000 lines. The output files are named based on the source file. For example, "test.txt" becomes "test_0001.txt", "test_0002.txt", etc.

@echo off
setlocal enableDelayedExpansion

set "src=result3.txt"

:: Redirect input to the source file
call :main "%src%" <"%src%"
exit /b


:main

::Get number of lines in file
for /f %%N in ('type "%src%"^|find /c /v ""') do set cnt=%%N

::Get first line to use as header for each file
set "header="
set /p "header="

set /a fileNum=1, lineNum=0
 :loop
  :: Exit if done
  if !lineNum! geq !cnt! exit /b

  :: establish zero padded numeric suffix
  set "suffix=000!fileNum!"
  set "suffix=!suffix:~-4!"

  >"%~n1_!suffix!%~x1" (
    echo(!header1!
    for /l %%N in (1 1 1000) do if !lineNum! lss !cnt! (
      set /a lineNum+=1
      set "ln="
      set /p "ln="
      echo(!ln!
    )
  )
  set /a fileNum+=1
  goto :loop

Upvotes: 1

Aacini
Aacini

Reputation: 67216

The solution below should run fast because it minimizes the number of operations performed with each line and keeps the output file connected all the time via > redirection (instead of >> append one, that open and close the file with each line).

@echo off
setlocal EnableDelayedExpansion

for /F "tokens=1-3 delims=/" %%a in ("%date%") do set "datePart=%%c%%a%%b"
< abc.txt call :SplitFile
goto :EOF


:SplitFile

rem Get the first line
set /P "firstline="

rem Place the next 1000 lines inside a new file

:nextFile
for /F "tokens=1-3 delims=:." %%a in ("%time%") do set timePart=%%a%%b%%c
set "timePart=%timePart: =0%"
echo Creating file: abc_%datePart%%timePart%.txt
(
   echo %firstLine%
   for /L %%i in (1,1,1000) do (
      set "line="
      set /P "line="
      if defined line echo !line!
   )
) > abc_%datePart%%timePart%.txt
if defined line goto nextFile
exit /B

This solution eliminate empty lines and may terminate process if an empty line appear at end of a new generated file. If the input file may have empty lines, this method could be modified defining a maximum number of possible empty lines.

Upvotes: 0

Related Questions