Reputation: 1
I'm currently trying to move and capture (CDC) several tables from MySQL to PostgreSQL using AWS DMS. The initial replication runs well, but the issue is with the PostgreSQL UUID type during CDC. For example, Let's say I have these source table A (MySQL) and dest table B (PostgreSQL):
CREATE TABLE `A` ( -- in MySQL
user_id VARCHAR(36), -- UUID STRING like 550e8400-e29b-41d4-a716-446655440000
...
)
CREATE TABLE `B` ( -- in PostgreSQL
user_id UUID,
...
)
The first phase (historical replication) is successful, but the on-going-replication (CDC) fails and leaves logs in the PostgreSQL's awsdms_apply_exceptions table. The statement and error from 'awsdms_apply_exceptions' look like this:
STATEMENT
INSERT INTO "public"."B" (user_id, ...) VALUES ('550e8400-e29b-41d4-a716-446655440000', ...)
ERROR
RetCode: SQL_ERROR SqlState: 22P02 NativeError: 1 Message: ERROR: invalid input syntax for type uuid: "5 5 0 e 8 4 0 0 - e 2 9 b - 4 1 d 4 - a 7 1 6 - 4 4 6 6 5 5 4 4 0 0 0 0 ";
Error while executing the query
I've checked that the statement is valid and when run on SQL console / EXECUTE the statement, it runs without any issues. I thought this issue had something to do with AWS DMS's datatype conversion and played with the datatypes, but no success. (MySQL VARBINARY on MySQL, change-data-type to STRING on DMS)
Also, I have a feeling that this might be due to the encoding of the source data, because the source data includes non-alpha numeric characters (Korean, Japanese, Chinese, etc) in other columns than user_id, hence AWS DMS adds an additional ' ' character after each character in user_id when reading? But I haven't found how AWS DMS works internally and a potential way to sort this out.
So, at the moment, the only option I have is either:
I am not really happy with either because of security and maintenance issues. If there's anybody who's experienced the same issue, could you share how you tackled the issue?
Upvotes: 0
Views: 1680
Reputation: 11
I encountered the same issue, and this SO answer was the only reference I could find. So since I figured it out, I hope sharing it here may be helpful for future readers in the same situation.
The issue has to do with data types. DMS uses its own intermediate types for the data conversion. A full list can be found here.
If the source column is a VARCHAR
in MySQL, DMS will interpret this as its internal data type WSTRING
(source).
At PostgreSQL side, there is no explicit mapping for the UUID
type (source). It seems that trying to insert a WSTRING
into a UUID
type field is problematic (for reasons I don't know, there is sadly no source code to inspect).
However, we can easily work around this by adding a data type transformation to the DMS task. The documentation has a nice example.
The DMS transformation rule for your table would look something like this:
{
"rule-type": "transformation",
"rule-id": "123",
"rule-name": "a-userid-transform",
"rule-target": "column",
"object-locator": {
"schema-name": "test",
"table-name": "a",
"column-name": "user_id"
},
"rule-action": "change-data-type",
"data-type": {
"type": "string",
"length": "36",
"scale": ""
}
}
Note that the length field is mandatory. Given your UUID format currently includes dashes, I made it 36. This was the same for us, so it should work. For a source that does not include dashes, 32 would work just as well.
Upvotes: 1