Viewing a portion of a very large CSV file?

Question

I have a ~1.0gb CSV file, and when trying to load it into Excel just to view, Excel crashes. I don't know the schema of the file, so it's difficult for me to load it into R or Python. The file contains restaurant reviews and has commas in it.

How can I open just a portion of the file (say, the first 100 rows, or 1.0mb's worth) in Windows Notepad or Excel?

Dilettant · Accepted Answer

In my version of excel the open dialogs do not seem to offer a "read only these many lines" option, only a start at line (used to skip headers I guess).

So if you have no head binary at hand on your platform, but python a simplistic working solution for your case should be (hard coded 100 lines aka rows):

#! /usr/bin/env python
from __future__ import print_function

import sys

LINE_COUNT = 100


def main():
    """Do the thing."""
    if len(sys.argv) != 3:
        sys.exit("Usage: InFIle OutHead100File")
    in_name, out_name = sys.argv[1:3]
    print("Simple head(100)[%s] -> %s ..." % (in_name, out_name))
    with open(in_name, 'rt') as f_in, open(out_name, 'wt') as f_out:
        for n in range(LINE_COUNT):
            f_out.write(f_in.readline())

if __name__ == '__main__':
    main()

and one would call the above code (assuming stored in script file so_x_head_100.py and given a file huge.csv should have first 100 rows copied to a file 100.csv):

$ python2 ./so_x_head_100.py huge.csv 100.csv
Simple head(100)[huge.csv] -> 100.csv ...

And now in 100.csvther are the first 100 lines of huge.csv.

Viewing a portion of a very large CSV file?

Answers (2)

Related Questions