Reputation: 47
I want to insert large data into Google's Cloud Spanner Table.
This is what i'm doing with node.js app, but it stops because the txt file is too large(almost 2GB).
1.load txt file
2.read line by line
3.split the line by "|"
4.build data object
5.insert data to Cloud Spanner table
Mysql supports data insertion with .sql file. Does Cloud Spanner supports kind of ways too?
Upvotes: 0
Views: 3177
Reputation: 42018
Cloud Spanner doesn't currently expose a bulk import method. It sounds like you are planning to insert each row individually, which is not the most optimal method. The documentation has best (and bad) practices for efficient bulk loading:
To get optimal write throughput for bulk loads, partition your data by primary key with this pattern:
Each partition contains a range of consecutive rows. Each commit contains data for only a single partition. A good rule of thumb for your number of partitions is 10 times the number of nodes in your Cloud Spanner instance. So if you have N nodes, with a total of 10*N partitions, you can assign rows to partitions by:
Sorting your data by primary key. Dividing it into 10*N separate sections. Creating a set of worker tasks that upload the data. Each worker will write to a single partition. Within the partition, it is recommended that your worker write the rows sequentially. However, writing data randomly within a partition should also provide reasonably high throughput.
As more of your data is uploaded, Cloud Spanner automatically splits and rebalances your data to balance load on the nodes in your instance. During this process, you may experience temporary drops in throughput.
Following this pattern, you should see a maximum overall bulk write throughput of 10-20 MiB per second per node.
It also looks like you are trying to load the entire large file into memory before processing. For large files, you should look at loading and processing chunks rather than the whole thing. I'm note a Node expert, but you should probably trying reading it in as a stream and not keeping everything in memory.
Upvotes: 1