David michael
David michael

Reputation: 155

Viewing a very large CSV file?

I have a very large 3.5 GB CSV file that I'd like to be able to read, sort through and filter for results based on various inputs. I'm pretty sure I can just import it to a MySQL database and go from there, but is there any program or online tool available that involves simply uploading the CSV and the rest is automatic?

Upvotes: 9

Views: 53114

Answers (11)

Valerie
Valerie

Reputation: 1

I'm loading files with 8 million records... 400 fields in each row.

  1. Download.csv file
  2. Open Excel
  3. Go To Data Menu
  4. Import Data
  5. Load to "Create Connection Only"
  6. Edit Connection
  7. Go to the Rows Button on the menu
  8. Select only 5 rows
  9. Load 5 rows to another tab in Excel
  10. Save this tab as a .csv
  11. Open up SSMS
  12. Using Import Flat File wizard
  13. Load the 5 rows into the database
  14. Delete the 5 rows
  15. Change field size to what is correct (Note: I used varchar(Max) on every field since I have 400 fields).
  16. Use Bulk Import Code to import the 8 million rows from the original .csv file into the SQL Table with headers.

BULK IMPORT

Bulk Insert DB.dbo.TableName From 'C:\Path....\FileName.csv' WITH (FORMAT = 'CSV' , FieldTerminator = ',' , RowTerminator = '0x0a' , FirstRow = 2) `

Upvotes: 0

Crystal L
Crystal L

Reputation: 571

You can try Acho. It's an online tool and provides a free trial as well. I recommend it because its interface looks pretty great and intuitive. Also, it has all features that you mentioned, including sorting or filtering values. Basically, I use it to shrink the size of the dataset and export it to Python to do further analysis.

Upvotes: 0

A W
A W

Reputation: 1129

You could use built-in excel's connection to do this .

Original Source : https://excel.officetuts.net/en/examples/open-large-csv

Steps :

  1. Create a new excel file
  2. Navigate to Data >> Get & Transform Data >> From File >> From Text/CSV and import the CSV file.
  3. After a while, you are going to get a window with the file preview.
  4. Click the little triangle next to the load button. enter image description here
  5. Select Load To…
  6. Now, we need to create a connection and add data to the Data Model. This won’t load data to an Excel sheet because we have a limit for about a million rows there. enter image description here
  7. Click OK. This will take a while to load.
  8. On the right side, you have the name of our file and the number of rows. If you save the file, you will notice that its size increased significantly.

enter image description here

  1. Double-click this area to open Power Query Editor. enter image description here
  2. Now, if you scroll down, you will notice that the new rows are added on the go.
  3. To change a value, right-click a cell and select Replace Values. enter image description here
  4. It’s going to replace all “builders” to “roofers” under the “profession” column.

Upvotes: 0

Desmond830
Desmond830

Reputation: 21

If it's a flat .CSV file and it does not involve a data pipeline, I'm not exactly sure about what you mean by "the rest is automatic".

For accessing larger .CSV files, the typical solutions are

  1. Insert your .CSV file into a SQL database such as MySQL, PostgreSQL etc.

You'll need to design a table schema, find a server to host the database, and write server side code to maintain or change the database.

  1. Processing you data using Python, or R.

Running Python and R on GBs of data will put a lot of stress to your local computer. It's also better for data exploration and analytics rather than table manipulation.

  1. Find a data hub for your data. For example, Acho Studio.

A data hub is much easier but its costs may vary. It does come with a GUI that help you sort and filter through a table pretty easily.

Upvotes: 0

jastr
jastr

Reputation: 889

CSV Explorer is an online tool to read, sort, and filter CSVs with millions of rows. Upload the CSV and it will automatically import it and let you start working with the data.

https://www.CSVExplorer.com

Upvotes: 1

Siddarth Kanted
Siddarth Kanted

Reputation: 5914

Since it is a CSV file.

  1. Download http://openrefine.org/download.html
  2. It is open source. Unzip openrefine.zip.
  3. Run openrefine-2.7-rc.1\openrefine.exe.
  4. It is a web app. So open http://127.0.0.1:3333/ in Chrome.
  5. Upload the large csv file. In my case the file size was 3.61 GB and it got opened successfully.

https://snag.gy/02WECq.jpg

Upvotes: 6

Neil McGuigan
Neil McGuigan

Reputation: 48277

You could try PostgreSQL 9.1+ and its file_fdw (File Foreign Data Wrapper) which would pretend that the CSV file is a table. If you replaced the CSV file with another CSV file of the same name, then you would see the new info immediately in the database.

You can improve performance by using a materialized view (PG 9.3+) which essentially creates a real database table from the CSV data. You could use pgAgent to refresh the materialized view on a schedule.

Another alternative would be to use the COPY statement:

/* the columns in this table are the same as the columns in your csv: */
create table if not exists my_csv (
  some_field text, ...
);

/* COPY appends, so truncate the table if loading fresh data again: */
truncate table my_csv;

/* 
you need to be a postgres superuser to use COPY 
use psql \copy if you can't be superuser 
put the csv file in /srv/vendor-name/
*/

copy 
  my_csv 
from 
  '/srv/vendor-name/my.csv'
with (
  format csv
);

Upvotes: 3

Estevão Lucas
Estevão Lucas

Reputation: 4688

Yes, there is.

You can use OpenRefine (or Google Refine). OpenRefine is like a spreadsheet on steroids.

The file size that you can manipulate depend on your computer's memory.

Upvotes: 9

Mohammad Alqudah
Mohammad Alqudah

Reputation: 23

I had a file with ~100 million records, I used linux command line to view the files (just taking a look).

$ more myBigFile.CSV

or

$ nano myBigFile.CSV

it worked with a 6 GB file

Upvotes: 0

Gunnar Bernstein
Gunnar Bernstein

Reputation: 6222

I had the same problem with a csv-file having over 3 Million lines. Could not open in OpenOffice Calc, Writer or Notepad++.

Then I used OpenOffice 4 base as a poor mans solution, which can link to csv. Short description (wording may not be correct as I use german OpenOffice).

  1. Prepare: Your file needs .csv extension. First line should have field names. Put file as only file in a new directory to avoid confusion. Otherwise all files will be imported.
  2. File - New - Database. The assistant should come up.
  3. Connect to an existing database, format TEXT (your file needs to have .csv extension).
  4. Next. Choose path to file (oddly not the file itself). Choose csv. Choose correct field delimiters.
  5. Next and Finish.
  6. Choose a name for your newly created db.

If everything is right you now see the table view with your newly created table.

You can also use gVim to view the file like in notepad, e.g. to add the first column descriptiom line.

You may create queries on this table. As the table has no indexes it is quite slow. Since OpenOffice does not make use of the hourglass it may seem the system has crashed.

Base is very limited and feels like early beta. Create new tables in that DB is not possible (thus no insert query to select from text file).

Export to csv is not possible. Reasonably sized query results can be (time consuming) copied and pasted to calc.

Upvotes: 2

RangerRick
RangerRick

Reputation: 51

Sure- there are quite a few Spreadsheet-like tools that support big data - IBM BigSheets being a major example.

For an online product with a free trial period, I'd recommend Datameer I've had relatively good success with them.

Upvotes: 1

Related Questions