Reputation: 21

Extracting n rows of text from a large csv file

I have a CSV file (foo.csv) with 200,000 rows. I need to break it into four files (foo1.csv, foo2.csv... etc.) with 50,000 rows each.

I already tried simple ctrl-v/-c using gui text editors, but the my computer slows to a halt.

What unix command(s) could I use to accomplish this task?

Upvotes: 2

Answers (6)

sourcerebels

Reputation: 5180

I wrote this little shell script for this topic very similar at yours.

This shell script + awk works fine for me:

#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
    if (NR >= initial_line && NR <= end_line) 
    print $0
}' $3

Used with this sample file (file.txt):

one
two
three
four
five
six

The command (it will extract from second to fourth line in the file):

edu@debian5:~$./script.sh 2 4 file.txt

Output of this command:

two
three
four

Of course, you can improve it, for example by testing that all argument values are the expected :-)

Upvotes: 0

Jon Freedman

Reputation: 9697

You can use sed

Upvotes: 0

Jeremy

Reputation: 122

split -l50000 foo.csv

Upvotes: 1

Guillaume Lebourgeois

Reputation: 3873

you should use head and tail.

head -n 50000 myfile > part1.csv
head -n 100000 myfile | tail -n 50000 > part2.csv 
head -n 150000 myfile | tail -n 50000 > part3.csv

etc ...

Else, but with no control on file names, you can use unix command split.

Upvotes: 6

Mark Rushakoff

Reputation: 258228

I don't have a terminal handy to try it out, but it should be just split -d -l 50000 foo.csv.

Hopefully the naming isn't terribly important because with the -d option, the output files will be named foo.csv00 .. foo.csv03. You can add the -a 1 option so that the suffixes are 0-3, but there's no simple way to get the suffix to be injected into the middle of the filename.

Upvotes: 4

deinst

Reputation: 18782

sed -n 2000,4000p somefile.txt

will print from lines 2000 to 4000 to stdout.

Upvotes: 3

Extracting n rows of text from a large csv file

Answers (6)

Related Questions