Prashant
Prashant

Reputation: 3

Batch Script to split record by fixed length

I have an input text file with a single line having thousands of records one after another. I want to split them after length of 10 characters each.

**Input Record** - 
====================== Begin of data =========================
 abcdefghijklmnopqrstuvwxyz1234567890       <= Input file having all records on single line
====================== End of data   =========================

**Expected output** -
====================== Begin of data =========================
 abcdefghij            <= each line of 10 characters
 klmnopqrst
 uvwxyz1234
 567890
====================== End of data   =========================

Please help me to do this using batch script.

Try using Notepad++

It worked well using Notepad++ with below regular expressions -

Find => (?-s).{10}
Replace => ${0}\\r\\n

The record length of 10 above was used only for simplicity, the actual record length is 800 bytes. There are 50 thousand records in each line.

Upvotes: -2

Views: 123

Answers (3)

phuclv
phuclv

Reputation: 41962

It's simpler in PowerShell where there's no limit. For files that aren't very big you can use this one-liner

$ (Get-Content -Raw ./in.txt) -split '(.{10})' -ne '' | Set-Content out.txt

# Or the shortened version
$ (gc -Ra in.txt) -split '(.{10})' -ne '' >out.txt

Of course it's better to write the entire script in PowerShell but if you really can't then you can simply call it from cmd or a batch file like this

powershell -C "(gc -Ra in.txt) -split '(.{10})' -ne '' >out.txt"

This method reads the whole file and split into 10-character strings using the .{10} regex so it won't work for files like GBs big. In such cases of huge files you can use this

$ Get-Content -AsByteStream -ReadCount 10 ./in.txt | `
  ForEach-Object { [Text.Encoding]::ASCII.GetString($_) } | `
  Set-Content out.txt

# Or the shortened version
$ gc -A -Re 10 in.txt |% { [Text.Encoding]::ASCII.GetString($_) } >out.txt

This will read the input file as a byte stream and then grab every 10 bytes and print as string. That means there's no limit in line length.
Remember to select the correct encoding of your files by replacing [Text.Encoding]::ASCII with
[Text.Encoding]::GetEncoding("windows-1252") (the default charset in US Windows), or
[Text.Encoding]::GetEncoding("iso-8859-1")... depending on whether your input files are in CP1252, ISO-8859-1, or other encodings... You can simply check the encoding in Notepad++.
For UTF-8 and UTF-16 you'll need [Text.Encoding]::UTF8 and [Text.Encoding]::Unicode but this won't quite work for UTF because of the variable multibyte encoding. You can use this solution instead:

Get-Content ./in.txt | ForEach-Object {
    $line = $_
    for ($i = 0; $i -lt $line.Length; $i += 10) {
        $line.Substring($i, [Math]::Min(10, $line.Length - $i))
    }
}

You can call from cmd like above, or add some options like this to speed up the startup time

powershell -NoProfile -ExecutionPolicy Bypass -NoLogo -NonInteractive -Command "gc -A -Re 10 in.txt |% { [Text.Encoding]::ASCII.GetString($_) } >out.txt"

Upvotes: 0

Aacini
Aacini

Reputation: 67236

@echo off
setlocal EnableDelayedExpansion

set "recLen=10"
set "chunk="
call :splitFile  < input.txt  > output.txt
goto :EOF


:splitFile

:nextChunk
rem Read next chunk and join to (remaining) previous one
set "newChunk="
set /P "newChunk="
if not defined newChunk goto EndOfFile
set "chunk=!chunk!!newChunk!"

rem Break current chunk in records of the required size
:nextRec
   echo !chunk:~0,%recLen%!
   set "chunk=!chunk:~%recLen%!"
   if not defined chunk goto nextChunk
if "!chunk:~%recLen%!" neq "" goto nextRec
goto nextChunk

:EndOfFile
if defined chunk echo !chunk!
exit /B

This method should work with records up to 1022 characters long. You can read further details about this method at this answer or this one.

Upvotes: 3

Magoo
Magoo

Reputation: 80193

@ECHO OFF
SETLOCAL
rem The following settings for the directories and filenames are names
rem that I use for testing and deliberately includes spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "destdir=u:\your results"
SET "filename1=%sourcedir%\q78869753.txt"
SET "outfile=%destdir%\outfile.txt"

(
FOR /f "usebackqdelims=" %%e IN ("%filename1%") DO SET "line=%%e"&CALL :sub
)>"%outfile%"

GOTO :EOF

:sub
IF NOT DEFINED line GOTO :eof
ECHO %line:~0,10%
SET "line=%line:~10%"
GOTO sub

Note that if the filename does not contain separators like spaces, then both usebackq and the quotes around %filename1% can be omitted.

You would need to change the values assigned to sourcedir and destdir to suit your circumstances. The listing uses a setting that suits my system.

I deliberately include spaces in names to ensure that the spaces are processed correctly.

I used a file named q78869753.txt containing your data plus some dummy data for my testing.

Produces the file defined as %outfile%

for documentation, see set /? for /? call /? from the prompt or or endless examples on SO.

Upvotes: 0

Related Questions