Jeffrey Kevin Pry
Jeffrey Kevin Pry

Reputation: 3296

Splitting A File On Delimiter

I have a file on a Linux system that is roughly 10GB. It contains 20,000,000 binary records, but each record is separated by an ASCII delimiter "$". I would like to use the split command or some combination thereof to chunk the file into smaller parts. Ideally I would be able to specify that the command should split every 1,000 records (therefore every 1,000 delimiters) into separate files. Can anyone help with this?

Upvotes: 5

Views: 6259

Answers (3)

Make sure by default the unix split will exhaust with suffixes once it reaches max threshold of default suffix limit of 2. More info on : https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html

Upvotes: 1

sehe
sehe

Reputation: 392833

The only unorthodox part of the problem seems to be the record separator. I'm sure this is fixable in awk pretty simply - but I happen to hate awk.

I would transfer it in the realm of 'normal' problems first:

tr '$' '\n' < large_records.txt | split -l 1000

This will by default create xaa, xab, xac... files; look at man split for more options

Upvotes: 5

symcbean
symcbean

Reputation: 48357

I love awk :)

BEGIN { RS="$"; chunk=1; count=0; size=1000 }
{
   print $0 > "/tmp/chunk" chunk; 
   if (++count>=size) {
      chunk++;
      count=0;
   }
}

(note that the redirection operator in awk only truncates/creates the file on its first invocation - subsequent references are treated as append operations - unlike shell redirection)

Upvotes: 2

Related Questions