Reputation: 21
I have a CSV file (foo.csv) with 200,000 rows. I need to break it into four files (foo1.csv, foo2.csv... etc.) with 50,000 rows each.
I already tried simple ctrl-v/-c using gui text editors, but the my computer slows to a halt.
What unix command(s) could I use to accomplish this task?
Upvotes: 2
Views: 9236
Reputation: 5180
I wrote this little shell script for this topic very similar at yours.
This shell script + awk works fine for me:
#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
if (NR >= initial_line && NR <= end_line)
print $0
}' $3
Used with this sample file (file.txt):
one
two
three
four
five
six
The command (it will extract from second to fourth line in the file):
edu@debian5:~$./script.sh 2 4 file.txt
Output of this command:
two
three
four
Of course, you can improve it, for example by testing that all argument values are the expected :-)
Upvotes: 0
Reputation: 3873
you should use head
and tail
.
head -n 50000 myfile > part1.csv
head -n 100000 myfile | tail -n 50000 > part2.csv
head -n 150000 myfile | tail -n 50000 > part3.csv
etc ...
Else, but with no control on file names, you can use unix command split
.
Upvotes: 6
Reputation: 258228
I don't have a terminal handy to try it out, but it should be just split -d -l 50000 foo.csv
.
Hopefully the naming isn't terribly important because with the -d
option, the output files will be named foo.csv00
.. foo.csv03
. You can add the -a 1
option so that the suffixes are 0-3, but there's no simple way to get the suffix to be injected into the middle of the filename.
Upvotes: 4
Reputation: 18782
sed -n 2000,4000p somefile.txt
will print from lines 2000 to 4000 to stdout.
Upvotes: 3