user3279189
user3279189

Reputation: 1653

Scala case class modularization

I'm new to scala and I have a requirement to refactor / modularize my code.

My code looks like this,

case class dim1(col1: String,col2: Int,col3)

val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim1(parts(0),parts(1).toInt,parts(2)) }

case class dim2(col1: String,col2: Int)

val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim2(parts(0),parts(1).toInt) }

case class dim3(col1: String,col2: Int,col3: String,col4: Int)

val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim3(parts(0),parts(1).toInt,parts(2),parts(3).toInt) }

case class dim4(col1: String,col2: String,col3: Int)

val dim1 = sc.textFile("s3n://dim1").map { row =>
val parts = row.split("\t")
dim4(parts(0),parts(1),parts(2).toInt) }

This is ETL SCALA transform code that runs on Apache Spark.

Here are the steps that I have ,

  1. Define case class for every dimension.
  2. Read a file from S3 and map it to respective case class. I also need to change datatype if it is required.

These steps are highly repeated and I would like to write a function like this ,

readAndMap(datasetlocation: String,caseclassnametomap: String)

With this my code will become ,

readAndMap("s3n://dim1",dim1)
readAndMap("s3n://dim2",dim2)
readAndMap("s3n://dim3",dim3)
readAndMap("s3n://dim4",dim4)

Some examples / directions will be highly appreciated

Thanks

Upvotes: 1

Views: 548

Answers (1)

tiran
tiran

Reputation: 2431

You can do something like this,

def readAndMap[A](datasetLocation: String)(createA: List[String] => A) = {
  sc.textFile(datasetLocation).map { row => 
    createA(row.split("\t").toList) 
  }
}

You can call this like

readAndMap[dim1]("s3n://dim1"){ parts => dim1(parts(0),parts(1).toInt,parts(2)) }
readAndMap[dim2]("s3n://dim2"){ parts => dim2(parts(0),parts(1).toInt) }
readAndMap[dim3]("s3n://dim3"){ parts => dim3(parts(0),parts(1).toInt,parts(2),parts(3).toInt) }
readAndMap[dim4]("s3n://dim4"){ parts => dim4(parts(0),parts(1),parts(2).toInt) }

You cannot directly give case class and ask method to construct an instance, because, the arity of the case class apply methods are different to each other.

Upvotes: 1

Related Questions