Mirdrack
Mirdrack

Reputation: 790

AWS DMS swap files consumes all the space

I'm migrating many databases but I have seen my databases with a size bigger than 50GB fails in the CDC after some time due to a lack of storage.

I'm using a replication instance class dms.r5.large and everything runs smoothly until the full-load is completed. When the CDC starts I got logs messages likes these:

D:  There are 188 swap files of total size 93156 Mb. Left to process 188 of size 93156 Mb

But swap files are never dropped the instance keeps accumulating swap files and eventually the instance runs out of storage.
A thing to notice is my swap usage in the monitoring metrics is near to zero.

I have already tried with a dms.r5.xlarge and the issue was the same, which makes me think memory is not a problem.

Do you know what could be the cause of this behavior? Is there a way to debug this?

Thank you!

More useful data:
Replication instance class: dms.r5.large, I have tried with dms.r5.xlarge.
40GB of storage, I have tried with 300GB but eventually the CDC phase consume all the storage.
The database to migrate is about 80GB.
Task settings:

{
    "TargetMetadata": {
        "TargetSchema": "",
        "SupportLobs": true,
        "FullLobMode": false,
        "LobChunkSize": 0,
        "LimitedSizeLobMode": true,
        "LobMaxSize": 32,
        "InlineLobMaxSize": 0,
        "LoadMaxFileSize": 0,
        "ParallelLoadThreads": 0,
        "ParallelLoadBufferSize": 0,
        "BatchApplyEnabled": false,
        "TaskRecoveryTableEnabled": false,
        "ParallelLoadQueuesPerThread": 0,
        "ParallelApplyThreads": 0,
        "ParallelApplyBufferSize": 0,
        "ParallelApplyQueuesPerThread": 0
    },
    "FullLoadSettings": {
        "TargetTablePrepMode": "DROP_AND_CREATE",
        "CreatePkAfterFullLoad": false,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 8,
        "TransactionConsistencyTimeout": 600,
        "CommitRate": 10000
    },
    "Logging": {
        "EnableLogging": true,
        "LogComponents": [{
            "Id": "SOURCE_UNLOAD",
            "Severity": "LOGGER_SEVERITY_DEFAULT"
        },{
            "Id": "SOURCE_CAPTURE",
            "Severity": "LOGGER_SEVERITY_DEFAULT"
        },{
            "Id": "TARGET_LOAD",
            "Severity": "LOGGER_SEVERITY_DEFAULT"
        },{
            "Id": "TARGET_APPLY",
            "Severity": "LOGGER_SEVERITY_INFO"
        },{
            "Id": "TASK_MANAGER",
            "Severity": "LOGGER_SEVERITY_DEBUG"
        }]
    },
    "ControlTablesSettings": {
        "historyTimeslotInMinutes": 5,
        "ControlSchema": "",
        "HistoryTimeslotInMinutes": 5,
        "HistoryTableEnabled": false,
        "SuspendedTablesTableEnabled": false,
        "StatusTableEnabled": false
    },
    "StreamBufferSettings": {
        "StreamBufferCount": 3,
        "StreamBufferSizeInMB": 8,
        "CtrlStreamBufferSizeInMB": 5
    },
    "ChangeProcessingDdlHandlingPolicy": {
        "HandleSourceTableDropped": true,
        "HandleSourceTableTruncated": true,
        "HandleSourceTableAltered": true
    },
    "ErrorBehavior": {
        "DataErrorPolicy": "LOG_ERROR",
        "DataTruncationErrorPolicy": "LOG_ERROR",
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",
        "DataErrorEscalationCount": 0,
        "TableErrorPolicy": "SUSPEND_TABLE",
        "TableErrorEscalationPolicy": "STOP_TASK",
        "TableErrorEscalationCount": 0,
        "RecoverableErrorCount": -1,
        "RecoverableErrorInterval": 5,
        "RecoverableErrorThrottling": true,
        "RecoverableErrorThrottlingMax": 1800,
        "RecoverableErrorStopRetryAfterThrottlingMax": false,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",
        "ApplyErrorInsertPolicy": "LOG_ERROR",
        "ApplyErrorUpdatePolicy": "LOG_ERROR",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",
        "ApplyErrorEscalationCount": 0,
        "ApplyErrorFailOnTruncationDdl": false,
        "FullLoadIgnoreConflicts": true,
        "FailOnTransactionConsistencyBreached": false,
        "FailOnNoTablesCaptured": false
    },
    "ChangeProcessingTuning": {
        "BatchApplyPreserveTransaction": true,
        "BatchApplyTimeoutMin": 1,
        "BatchApplyTimeoutMax": 30,
        "BatchApplyMemoryLimit": 500,
        "BatchSplitSize": 0,
        "MinTransactionSize": 1000,
        "CommitTimeout": 1,
        "MemoryLimitTotal": 1024,
        "MemoryKeepTime": 60,
        "StatementCacheSize": 50
    },
    "ValidationSettings": {
        "EnableValidation": true,
        "ValidationMode": "ROW_LEVEL",
        "ThreadCount": 5,
        "PartitionSize": 10000,
        "FailureMaxCount": 10000,
        "RecordFailureDelayInMinutes": 5,
        "RecordSuspendDelayInMinutes": 30,
        "MaxKeyColumnSize": 8096,
        "TableFailureMaxCount": 1000,
        "ValidationOnly": false,
        "HandleCollationDiff": false,
        "RecordFailureDelayLimitInMinutes": 0,
        "SkipLobColumns": false,
        "ValidationPartialLobSize": 0,
        "ValidationQueryCdcDelaySeconds": 0
    },
    "PostProcessingRules": null,
    "CharacterSetSettings": null,
    "LoopbackPreventionSettings": null,
    "BeforeImageSettings": null
}

Upvotes: 3

Views: 7081

Answers (2)

ASHISH RAJ
ASHISH RAJ

Reputation: 1

You should increase configuration in Task settings as well. Like -

  • MemoryLimitTotal
  • BatchApplyMemoryLimit

Upvotes: 0

Mirdrack
Mirdrack

Reputation: 790

Issues were due to a high target latency, the root cause was the structure of the database tables.

Tables with significant amount of records had a lack of primary keys or unique identifiers that causes a full-table scans, changes were not applied and then saved in the replication instance storage.

Eventually the instance will run out of storage.

To fix this you should run a pre migration assessment to check if you database applies for DMS migration.

Another way to fix this is add an extra column in the migration to create a unique key and remove it after the migration.

Upvotes: 3

Related Questions