Reputation: 1653
I'm new to scala and I have a requirement to refactor / modularize my code.
My code looks like this,
case class dim1(col1: String,col2: Int,col3)
val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim1(parts(0),parts(1).toInt,parts(2)) }
case class dim2(col1: String,col2: Int)
val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim2(parts(0),parts(1).toInt) }
case class dim3(col1: String,col2: Int,col3: String,col4: Int)
val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim3(parts(0),parts(1).toInt,parts(2),parts(3).toInt) }
case class dim4(col1: String,col2: String,col3: Int)
val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim4(parts(0),parts(1),parts(2).toInt) }
This is ETL SCALA transform code that runs on Apache Spark.
Here are the steps that I have ,
These steps are highly repeated and I would like to write a function like this ,
readAndMap(datasetlocation: String,caseclassnametomap: String)
With this my code will become ,
readAndMap("s3n://dim1",dim1)
readAndMap("s3n://dim2",dim2)
readAndMap("s3n://dim3",dim3)
readAndMap("s3n://dim4",dim4)
Some examples / directions will be highly appreciated
Thanks
Upvotes: 1
Views: 548
Reputation: 2431
You can do something like this,
def readAndMap[A](datasetLocation: String)(createA: List[String] => A) = {
sc.textFile(datasetLocation).map { row =>
createA(row.split("\t").toList)
}
}
You can call this like
readAndMap[dim1]("s3n://dim1"){ parts => dim1(parts(0),parts(1).toInt,parts(2)) }
readAndMap[dim2]("s3n://dim2"){ parts => dim2(parts(0),parts(1).toInt) }
readAndMap[dim3]("s3n://dim3"){ parts => dim3(parts(0),parts(1).toInt,parts(2),parts(3).toInt) }
readAndMap[dim4]("s3n://dim4"){ parts => dim4(parts(0),parts(1),parts(2).toInt) }
You cannot directly give case class and ask method to construct an instance, because, the arity of the case class apply methods are different to each other.
Upvotes: 1