devThoughts
devThoughts

Reputation: 830

RPC timeout error while exporting data from CQL

I am trying to export data from cassandra using CQL client. A column family has about 100000 rows in it. when i am copying dta into csv file using COPY TO command i get following rpc_time out error.

copy mycolfamily to '/root/mycolfamily.csv'
Request did not complete within rpc_timeout.

I am running in:

[cqlsh 3.1.6 | Cassandra 1.2.8 | CQL spec 3.0.0 | Thrift protocol 19.36.0]

How can I increase RPC timeout limit?

I tried adding rpc_timeout_in_ms: 20000 (defalut is 10000) in my conf/cassandra.yaml file. but while restarting cassandra I get:

[root@user ~]# null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=rpc_timeout_in_ms for JavaBean=org.apache.cassandra.config.Config@71bfc4fc; Unable to find property 'rpc_timeout_in_ms' on class: org.apache.cassandra.config.Config
Invalid yaml; unable to start server.  See log for stacktrace.

Upvotes: 5

Views: 6012

Answers (4)

user3111988
user3111988

Reputation: 46

The best way yo export the data is using nodetool snapshot option. This returns immediately and can be restored later on. The only issue is that this export is per node and for the entire cluster.

Example: nodetool -h localhost -p 7199 snapshot

See reference: http://docs.datastax.com/en/archived/cassandra/1.1/docs/backup_restore.html#taking-a-snapshot

Upvotes: 0

safato
safato

Reputation: 118

I have encountered the same problem a few minutes ago then I have found CAPTURE and it worked:

First start capturing on cqlsh and then run your query with some limiting of your choice.

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/capture_r.html

Upvotes: 1

akshat thakar
akshat thakar

Reputation: 1526

You can use Auto pagination by specifying fetch size in Datastax Java driver.

Statement stmt = new SimpleStatement("SELECT id FROM mycolfamily;"); 
stmt.setFetchSize(500); 
session.execute(stmt); 
for (Row r:result.all()){
    //write to file
}

Upvotes: 2

aacanakin
aacanakin

Reputation: 2913

The COPY command currently does the same thing with SELECT with LIMIT 99999999. So, it will eventually goes to timeout while your data is growing. Here's the export function;

https://github.com/apache/cassandra/blob/trunk/bin/cqlsh#L1524

I'm doing the same export on production. What I'm doing is the following;

  • make select * from table where timeuuid = someTimeuuid limit 10000
  • write the result set to a csv file w/ >> mode
  • make the next selects with respect to the last timeuuid

You can pipe command in cqlsh by the following cqlsh command

echo "{$cql}" | /usr/bin/cqlsh -u user -p password localhost 9160 > file.csv

Upvotes: 5

Related Questions