How import dataset from S3 to cassandra?

i Launch cluster spark cassandra with datastax dse in aws cloud. So my dataset storage in S3. But i don't know how transfer data from S3 to my cluster cassandra. Please help me

Upvotes: 0

Views: 2063

Answers (2)

Anthony
Anthony

Reputation: 1543

What @phact described is through using the Spark API that comes with the DataStax Enterprise and could be very useful if there's ETL work that needs to be done along with the loading. For loading only, you can use the sstableloader bulk loading capability. Here's a tutorial to get you started.

Upvotes: 1

phact
phact

Reputation: 7305

The details depend on your file format and C* data model but it might look something like this:

  • Read the file from s3 into an RDD

    val rdd = sc.textFile("s3n://mybucket/path/filename.txt.gz")

  • Manipulate the rdd

  • Write the rdd to a cassandra table:

    rdd.saveToCassandra("test", "kv", SomeColumns("key", "value"))

Upvotes: 1

Related Questions