MANISHDAN LANGA
MANISHDAN LANGA

Reputation: 2237

Store Large CSV file which contains 2 million log data

I have very Large CSV file which contains 2 million log data for each customer coming every day, We have to develop the analytics tool which gives the summary of a various group by of CSV file data.

We have developed using Mysql-InnoDB but running very slow. we have applied proper indexing on tables and hardware is also good.

Is Mysql capable for this time of analytical tool or do I need to check any other database?

Each SQL Select query contains 15-20 sec to get output from the single table.

Upvotes: 0

Views: 502

Answers (1)

Deepak Puthraya
Deepak Puthraya

Reputation: 1435

I am assuming that the data that you is insert-only and that you are mostly looking to build dashboards that show some metrics to clients.

You can approach this problem in a different way. Instead of directly storing the CSV data in the SQL database you can process the CSV first using Spark or Spring batch or AirFlow depending the language options. Doing this lets you reduce the amount of data that you have to store.

Another approach that you can consider is processing the CSV and pushing them to something like BigQuery or Redshift. These databases are designed to process and query large data.

To fasten queries, you can also create Materialized views to build dashboards quickly. I would not recommend this though as it is not a very scalable approach.

I recommend that you process the data first and generate the metrics that are required and store them in SQL and build dashboards on top of them instead of directly saving them.

Upvotes: 1

Related Questions