Throughput improvement for Billion row insert to CloudSql

Question

I have a table which has about 600 million to 1 billion rows per day. i.e. 30m- 40m recs/hour

1 hour's worth of data inserts to CloudSql is taking about 35 mins avg. What can i do to improve this performance wise, configuration wise, DB-design wise and/or any other things we might miss?

Data is being insert from project 1 to project 2, reading parquet files from GCS, convert them using DataProc cluster in a format acceptable to CloudSql and then inserted into CloudSql, given the limitations mysql (aka CloudSql) naturally imposes on large volume inserts anyways. I am using a 20 Master node cluster. My target MySql instance is a 52 GB, 8 vCPU CloudSql instance with 600 GB SSD storage with innodb_write_io_threads set to 12. Database version is MySQL 5.7.

How can i possibly increase the throughput i.e. 1 hour's data (35 million recs) is taking about 35-40 minutes? CPU usage is at 90% and memory usage is at 80%

Throughput improvement for Billion row insert to CloudSql

Answers (1)

Related Questions