Reputation: 105
I have a table which has about 600 million to 1 billion rows per day. i.e. 30m- 40m recs/hour
1 hour's worth of data inserts to CloudSql is taking about 35 mins avg. What can i do to improve this performance wise, configuration wise, DB-design wise and/or any other things we might miss?
Data is being insert from project 1 to project 2, reading parquet files from GCS, convert them using DataProc cluster in a format acceptable to CloudSql and then inserted into CloudSql, given the limitations mysql (aka CloudSql) naturally imposes on large volume inserts anyways. I am using a 20 Master node cluster. My target MySql instance is a 52 GB, 8 vCPU CloudSql instance with 600 GB SSD storage with innodb_write_io_threads set to 12. Database version is MySQL 5.7.
How can i possibly increase the throughput i.e. 1 hour's data (35 million recs) is taking about 35-40 minutes? CPU usage is at 90% and memory usage is at 80%
Upvotes: 1
Views: 350
Reputation: 4457
I do not think that CloudSQL and MySQL designed for this use case, for fully managed big data analytics on GCP you should use fully managed cloud data warehouse - BigQuery instead.
Datproc Spark is easily integrated with BigQuery via Spark BigQuery connector.
Upvotes: 1