How to COPY a large Cassandra table without running out of memory?

Question

I am attempting to run a simple Cassandra database COPY script, like the example below (or some variation that is very similar):

COPY my_keyspace_name.my_table_name TO 'cassandra_dump/my_keyspace_name.my_table_name.csv' WITH HEADER=true AND PAGETIMEOUT=40 AND PAGESIZE=20 AND DELIMITER='|';

It works on most tables except my largest one. In that case I get an error where it cannot allocate enough memory. The file size of the table is nowhere near as large in data as the error message claims (less than 1GB).

749314 rows exported to 1 files in 9 minutes and 11.240 seconds.

./dump_cassandra.sh: xmalloc: ../../.././lib/sh/strtrans.c:63: cannot allocate 18446744072166431589 bytes (6442528768 bytes allocated)", "stdout_lines": ["[Thu May 17 13:41:47 UTC 2018] Executing the following query:", "COPY my_keyspace_name.my_table_name TO 'cassandra_dump/my_keyspace_name.my_table_name.csv' WITH HEADER=true AND PAGETIMEOUT=40 AND PAGESIZE=20 AND DELIMITER='|';"

This answer seemed promising, but unfortunately it does not work for me.

Is there something I am missing that is preventing me from running a successful COPY on a large (relatively speaking) table?

--

EDIT: This error seems to be environmental. I have had mixed results on different servers with nearly identical amounts of data.

Ben Harrison · Accepted Answer

Setting MAXOUTPUTSIZE will split the backup data across multiple files and does not cause this error to occur

COPY my_keyspace_name.my_table_name TO 'cassandra_dump/my_keyspace_name.my_table_name.csv' WITH HEADER=true AND PAGETIMEOUT=40 AND MAXOUTPUTSIZE=100000 AND DELIMITER='|';

How to COPY a large Cassandra table without running out of memory?

Answers (1)

Related Questions