Greenhorn
Greenhorn

Reputation: 1821

Split a .txt file based on content

I have a huge *.txt file as follows:

~~~~~~~~ small file content 1 <br>
~~~~~~~~ small file content 2 <br>
...
~~~~~~~~ small file content n <br>

How do I split this into n files, preferably via bash?

Upvotes: 3

Views: 2476

Answers (3)

jaypal singh
jaypal singh

Reputation: 77085

If the content of your HUGE text file is on every line (i.e each line contains content that you would like to split then this should work) -

One-liner:

awk '{print >("SMALL_BATCH_OF_FILES_" NR)}' BIG_FILE

Test:

[jaypal:~/Temp] cat BIG_FILE
~~~~~~~~ small file content 1
~~~~~~~~ small file content 2
~~~~~~~~ small file content 3
~~~~~~~~ small file content 4
~~~~~~~~ small file content n-1
~~~~~~~~ small file content n

[jaypal:~/Temp] awk '{print >("SMALL_BATCH_OF_FILES_" NR)}' BIG_FILE

[jaypal:~/Temp] ls -lrt SMALL_BATCH_OF_FILES_*
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_6
-rw-r--r--  1 jaypalsingh  staff  32 17 Dec 14:19 SMALL_BATCH_OF_FILES_5
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_4
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_3
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_2
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_1

[jaypal:~/Temp] cat SMALL_BATCH_OF_FILES_1 
~~~~~~~~ small file content 1
[jaypal:~/Temp] cat SMALL_BATCH_OF_FILES_2 
~~~~~~~~ small file content 2
[jaypal:~/Temp] cat SMALL_BATCH_OF_FILES_6
~~~~~~~~ small file content n

Upvotes: 0

Zsolt Botykai
Zsolt Botykai

Reputation: 51593

With awk:

awk 'BEGIN {c=1} NR % 10000 == 0 { c++ } { print $0 > ("splitfile_" c) }' LARGEFILE

will do. It sets up a counter which will be incremented on every 10000 line. Then writes the lines to ˙splitfile_` file(s).

HTH

Upvotes: 0

Fredrik Pihl
Fredrik Pihl

Reputation: 45644

Use csplit

$ csplit --help
Usage: csplit [OPTION]... FILE PATTERN...
Output pieces of FILE separated by PATTERN(s) to files `xx00', `xx01', ...,
and output byte counts of each piece to standard output.

Upvotes: 13

Related Questions