Reputation: 12515
I have a ~1.0gb CSV file, and when trying to load it into Excel just to view, Excel crashes. I don't know the schema of the file, so it's difficult for me to load it into R or Python. The file contains restaurant reviews and has commas in it.
How can I open just a portion of the file (say, the first 100 rows, or 1.0mb's worth) in Windows Notepad or Excel?
Upvotes: 2
Views: 2571
Reputation: 8212
If you want to do somewhat more selective fishing for particular rows, then the python csv
module will allow you to read the csv file row by row into Python data structures. Consult the documentation.
This may be useful if just grabbing the first hundred lines reveals nothing about many of the columns because they are blank in all those rows. So you could easily write a program in Python to read as many rows as it takes to find and write out a few rows with non-blank data in particular columns. Likewise if you want to analyze a subset of the data matching particular criteria, you can read all the rows in and write only the interesting ones out for further analysis.
An alternative to csv is pandas. Bigger learning curve, but it is probably the right tool for analyzing big data. (1Gb is not very big these days).
Upvotes: 1
Reputation: 3345
In my version of excel the open dialogs do not seem to offer a "read only these many lines" option, only a start at line (used to skip headers I guess).
So if you have no head binary at hand on your platform, but python a simplistic working solution for your case should be (hard coded 100 lines aka rows):
#! /usr/bin/env python
from __future__ import print_function
import sys
LINE_COUNT = 100
def main():
"""Do the thing."""
if len(sys.argv) != 3:
sys.exit("Usage: InFIle OutHead100File")
in_name, out_name = sys.argv[1:3]
print("Simple head(100)[%s] -> %s ..." % (in_name, out_name))
with open(in_name, 'rt') as f_in, open(out_name, 'wt') as f_out:
for n in range(LINE_COUNT):
f_out.write(f_in.readline())
if __name__ == '__main__':
main()
and one would call the above code (assuming stored in script file so_x_head_100.py
and given a file huge.csv
should have first 100 rows copied to a file 100.csv
):
$ python2 ./so_x_head_100.py huge.csv 100.csv
Simple head(100)[huge.csv] -> 100.csv ...
And now in 100.csv
ther are the first 100 lines of huge.csv
.
Upvotes: 4