Reputation: 116
I am working in spark I have many csv files that contain lines, a line looks like that:
2017,16,16,51,1,1,4,-79.6,-101.90,-98.900
It can contain more or less fields, depends on the csv file
Each file corresponds to a cassandra table, where I need to insert all the lines the file contains so what I basically do is get the line, split its elements and put them in a List[Double]
sc.stop
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)
val nameTable = "artport"
val ligne = "20171,16,165481,51,1,1,4,-79.6000,-101.7000,-98.9000"
val linetoinsert : List[String] = ligne.split(",").toList
var ainserer : Array[Double] = new Array[Double](linetoinsert.length)
for (l <- 0 to linetoinsert.length)yield {ainserer(l) = linetoinsert(l).toDouble}
val liste = ainserer.toList
val rdd = sc.parallelize(liste)
rdd.saveToCassandra("db", nameTable) //db is the name of my keyspace in cassandra
When I run my code I get this error
java.lang.IllegalArgumentException: requirement failed: Columns not found in Double: [collecttime, sbnid, enodebid, rackid, shelfid, slotid, channelid, c373910000, c373910001, c373910002]
at scala.Predef$.require(Predef.scala:224)
at com.datastax.spark.connector.mapper.DefaultColumnMapper.columnMapForWriting(DefaultColumnMapper.scala:108)
at com.datastax.spark.connector.writer.MappedToGettableDataConverter$$anon$1.<init>(MappedToGettableDataConverter.scala:37)
at com.datastax.spark.connector.writer.MappedToGettableDataConverter$.apply(MappedToGettableDataConverter.scala:28)
at com.datastax.spark.connector.writer.DefaultRowWriter.<init>(DefaultRowWriter.scala:17)
at com.datastax.spark.connector.writer.DefaultRowWriter$$anon$1.rowWriter(DefaultRowWriter.scala:31)
at com.datastax.spark.connector.writer.DefaultRowWriter$$anon$1.rowWriter(DefaultRowWriter.scala:29)
at com.datastax.spark.connector.writer.TableWriter$.apply(TableWriter.scala:382)
at com.datastax.spark.connector.RDDFunctions.saveToCassandra(RDDFunctions.scala:35)
... 60 elided
I figured out that the insertion works if my RDD was of type :
rdd: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Double, Double, Double, Double, Double, Double)]
But the one I get from what I am doing is RDD org.apache.spark.rdd.RDD[Double]
I can't use scala Tuple9 for example because I don't know the number of elements my list is going to contain before execution, this solution also doesn't fit my problem because sometimes I have more than 100 columns in my csv and tuple stops at Tuple22
Thanks for your help
Upvotes: 1
Views: 3115
Reputation: 35692
When you see this for a join column - sometimes you need to wrap the join value in a Tuple1
eg
map({case a=> Tuple1(a.p:BigInt) })
.joinWithCassandraTable("keyspacename", "tableName", joinColumns = SomeColumns("columnName"))
Upvotes: 1
Reputation: 2280
As @SergGr mentioned Cassandra table has a schema with known columns. So you need to map your Array
to Cassandra schema
before saving to Cassandra database. You can use Case Class
for this. Try the following code, I assume each column in Cassandra
table is of type Double
.
//create a case class equivalent to your Cassandra table
case class Schema(collecttime: Double,
sbnid: Double,
enodebid: Double,
rackid: Double,
shelfid: Double,
slotid: Double,
channelid: Double,
c373910000: Double,
c373910001: Double,
c373910002: Double)
object test {
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
def main(args: Array[String]): Unit = {
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)
val nameTable = "artport"
val ligne = "20171,16,165481,51,1,1,4,-79.6000,-101.7000,-98.9000"
//parse ligne string Schema case class
val schema = parseString(ligne)
//get RDD[Schema]
val rdd = sc.parallelize(Seq(schema))
//now you can save this RDD to cassandra
rdd.saveToCassandra("db", nameTable)
}
//function to parse string to Schema case class
def parseString(s: String): Schema = {
//get each field from string array
val Array(collecttime, sbnid, enodebid, rackid, shelfid, slotid,
channelid, c373910000, c373910001, c373910002, _*) = s.split(",").map(_.toDouble)
//map those fields to Schema class
Schema(collecttime,
sbnid,
enodebid,
rackid,
shelfid,
slotid,
channelid,
c373910000,
c373910001,
c373910002)
}
}
Upvotes: 2