Reputation: 2033
I wish to split a large file (with ~ 17M Lines of Strings) into multiple files with varying number of lines in each chunk. Would it be possible to send in an array to the 'split -l' command like this:
[
1=>1000000,
2=>1000537,
...
]
so as to send those many number of lines to each chunk
Upvotes: 8
Views: 3385
Reputation: 123690
Use a compound command:
{
head -n 10000 > output1
head -n 200 > output2
head -n 1234 > output3
cat > remainder
} < yourbigfile
This also works with loops:
{
i=1
for n in 10000 200 1234
do
head -n $n > output$i
let i++
done
cat > remainder
} < yourbigfile
This does not work on OS X, where head
reads and discards additional output.
Upvotes: 12
Reputation: 4903
You could use sed
by getting another script to generate the sed
commands for you.
# split_gen.py
use strict;
my @limits = ( 100, 250, 340,999);
my $filename = "joker";
my $start = 1;
foreach my $end (@limits) {
print qq{sed -n '$start,${end}p;${end}q' $filename > $filename.$start-$end\n};
$start = $end + 1;
}
Run thus perl split_gen.py
giving:
sed -n '1,100p;100q' joker > joker.1-100
sed -n '101,250p;250q' joker > joker.101-250
sed -n '251,340p;340q' joker > joker.251-340
sed -n '341,999p;999q' joker > joker.341-999
If you're happy with the command then you can
perl split_gen.py | sh
Then enjoy the wait as it may be slow with big files.
Upvotes: 1
Reputation: 45135
The split
command does not have that capability, so you'll have to use a different tool,
or write one of your own.
Upvotes: 1