Fornax-A
Fornax-A

Reputation: 1032

split large file into small files with condition

Hi try to split a big file.dat (120 Mb) in a lot of smaller files.

I know split command should do this for me, let's say:

split --lines=#number file.dat

but this divide my big file in a number of files which have the same number of lines (#number).

If I want, instead, a if-like condition with which divide the file, how can I do it?

For instance: I want to divide this file when the integer part of the first column is different from the precedent one.

A file.dat example should be:

1.2 432.1 87.1
1.3  3.5 557.2
2.1 1.2 43.56
2.33 19.2 34.7
2.4 32.6 41.8
2.56 23.5 66.9
4.1 143.7  54.0
5.5 432.8 23.4
6.7 423.9 0.3

with this example I should have 5 different files, the first one with the first two lines; the second one with four lines; the third one with one line and so on. Is this possible? Thanks to all.

Upvotes: 1

Views: 565

Answers (2)

karakfa
karakfa

Reputation: 67567

awk to the rescue!

$ awk '    NR==1{p=int($1);c=1} 
      int($1)==p{print > "file"c".seq";next} 
                {p=int($1);c++;print > "file"c".seq"}' input


$ ls file*.seq
file1.seq  file2.seq  file3.seq  file4.seq  file5.seq

$ cat file*.seq
1.2 432.1 87.1
1.3  3.5 557.2
2.1 1.2 43.56
2.33 19.2 34.7
2.4 32.6 41.8
2.56 23.5 66.9
4.1 143.7  54.0
5.5 432.8 23.4
6.7 423.9 0.3

$ wc -l file*.seq
  2 file1.seq
  4 file2.seq
  1 file3.seq
  1 file4.seq
  1 file5.seq
  9 total

when too many files opened you need to close them at one point. Change

{p=int($1);c++;print > "file"c".seq"}

to

{close("file"c".seq");p=int($1);c++;print > "file"c".seq"}

Upvotes: 2

Mr. Llama
Mr. Llama

Reputation: 20919

Assuming you're not looking for pure bash, awk can redirect print statements to individual files.

For example, you can redirect to a file based on the value of your first field:

awk '{
    outfile = $1 ".txt"
    print $0 > outfile
}' input_file.txt

Note that the above code will need some tweaking to work in your case, but it should be enough to get you started.

Upvotes: 0

Related Questions