Reputation: 155
I have a very large 3.5 GB CSV file that I'd like to be able to read, sort through and filter for results based on various inputs. I'm pretty sure I can just import it to a MySQL database and go from there, but is there any program or online tool available that involves simply uploading the CSV and the rest is automatic?
Upvotes: 9
Views: 53114
Reputation: 1
I'm loading files with 8 million records... 400 fields in each row.
BULK IMPORT
Bulk Insert DB.dbo.TableName From 'C:\Path....\FileName.csv' WITH (FORMAT = 'CSV' , FieldTerminator = ',' , RowTerminator = '0x0a' , FirstRow = 2) `
Upvotes: 0
Reputation: 571
You can try Acho. It's an online tool and provides a free trial as well. I recommend it because its interface looks pretty great and intuitive. Also, it has all features that you mentioned, including sorting or filtering values. Basically, I use it to shrink the size of the dataset and export it to Python to do further analysis.
Upvotes: 0
Reputation: 1129
You could use built-in excel's connection to do this .
Original Source : https://excel.officetuts.net/en/examples/open-large-csv
Steps :
Upvotes: 0
Reputation: 21
If it's a flat .CSV file and it does not involve a data pipeline, I'm not exactly sure about what you mean by "the rest is automatic".
For accessing larger .CSV files, the typical solutions are
You'll need to design a table schema, find a server to host the database, and write server side code to maintain or change the database.
Running Python and R on GBs of data will put a lot of stress to your local computer. It's also better for data exploration and analytics rather than table manipulation.
A data hub is much easier but its costs may vary. It does come with a GUI that help you sort and filter through a table pretty easily.
Upvotes: 0
Reputation: 889
CSV Explorer is an online tool to read, sort, and filter CSVs with millions of rows. Upload the CSV and it will automatically import it and let you start working with the data.
Upvotes: 1
Reputation: 5914
Since it is a CSV file.
Upvotes: 6
Reputation: 48277
You could try PostgreSQL 9.1+ and its file_fdw (File Foreign Data Wrapper) which would pretend that the CSV file is a table. If you replaced the CSV file with another CSV file of the same name, then you would see the new info immediately in the database.
You can improve performance by using a materialized view (PG 9.3+) which essentially creates a real database table from the CSV data. You could use pgAgent to refresh the materialized view on a schedule.
Another alternative would be to use the COPY statement:
/* the columns in this table are the same as the columns in your csv: */
create table if not exists my_csv (
some_field text, ...
);
/* COPY appends, so truncate the table if loading fresh data again: */
truncate table my_csv;
/*
you need to be a postgres superuser to use COPY
use psql \copy if you can't be superuser
put the csv file in /srv/vendor-name/
*/
copy
my_csv
from
'/srv/vendor-name/my.csv'
with (
format csv
);
Upvotes: 3
Reputation: 4688
Yes, there is.
You can use OpenRefine (or Google Refine). OpenRefine is like a spreadsheet on steroids.
The file size that you can manipulate depend on your computer's memory.
Upvotes: 9
Reputation: 23
I had a file with ~100 million records, I used linux command line to view the files (just taking a look).
$ more myBigFile.CSV
or
$ nano myBigFile.CSV
it worked with a 6 GB file
Upvotes: 0
Reputation: 6222
I had the same problem with a csv-file having over 3 Million lines. Could not open in OpenOffice Calc, Writer or Notepad++.
Then I used OpenOffice 4 base as a poor mans solution, which can link to csv. Short description (wording may not be correct as I use german OpenOffice).
If everything is right you now see the table view with your newly created table.
You can also use gVim to view the file like in notepad, e.g. to add the first column descriptiom line.
You may create queries on this table. As the table has no indexes it is quite slow. Since OpenOffice does not make use of the hourglass it may seem the system has crashed.
Base is very limited and feels like early beta. Create new tables in that DB is not possible (thus no insert query to select from text file).
Export to csv is not possible. Reasonably sized query results can be (time consuming) copied and pasted to calc.
Upvotes: 2
Reputation: 51
Sure- there are quite a few Spreadsheet-like tools that support big data - IBM BigSheets being a major example.
For an online product with a free trial period, I'd recommend Datameer I've had relatively good success with them.
Upvotes: 1