Reputation: 246
i Launch cluster spark cassandra with datastax dse in aws cloud. So my dataset storage in S3. But i don't know how transfer data from S3 to my cluster cassandra. Please help me
Upvotes: 0
Views: 2063
Reputation: 1543
What @phact described is through using the Spark API that comes with the DataStax Enterprise and could be very useful if there's ETL work that needs to be done along with the loading.
For loading only, you can use the sstableloader
bulk loading capability. Here's a tutorial to get you started.
Upvotes: 1
Reputation: 7305
The details depend on your file format and C* data model but it might look something like this:
Read the file from s3 into an RDD
val rdd = sc.textFile("s3n://mybucket/path/filename.txt.gz")
Manipulate the rdd
Write the rdd to a cassandra table:
rdd.saveToCassandra("test", "kv", SomeColumns("key", "value"))
Upvotes: 1