Reputation: 561
I've found some references on here that refer to copying one dynamoDB table to another, but I've had trouble finding anything that refers to changing the primary key while doing so.
Basically I have a schema that looks like this (with drastically different fields/data but the idea is the same):
PK Author Text LastInitial
-------------------------------------
1 Bob [lots of text] R
2 Jim [lots of text] H
3 Sarah [lots of text] J
...
with about 280+ million rows, 62 GB in size
I need to copy it into a new table that looks like this:
PK Author Text
--------------------------
1R Bob [lots of text]
2H Jim [lots of text]
3J Sarah [lots of text]
...
So you see, as I'm transferring the data I'm also building a new primary key (PK
+ LastInitial
).
I thought for sure I could do this easily with AWS's Data Pipeline tool but I can't seem to figure out how to do the transform. It also seems unfortunate that I can't transfer it directly from one dynamo table to another, and that it must go to S3 first.
Is there a slick way of solving this, or do I just need to write a script using the SDK and run it on an EC2 instance?
Upvotes: 1
Views: 4784
Reputation: 101
There might be other ways to deal with this but, you can try using Glue ETL job to copy data from one table to other. It is a bit hacky but it gets the job done pretty easily. You can use Glue crawler to create a data catalog of the first table. Then you can use the Glue ETL job code suggested here to copy over the data to second table. You should also be able to manipulate the data any way you want with in the ETL job.
Upvotes: 1