Mapping an input file to different RDDs

Question

I have a text file consisting of columns of integers. Assuming that I have N number of columns, I need to have N-1 number of PairRDDs. Each PairRDD has one of the 0 to N-2 columns of my file as Key and the last column as Value. Number of columns in my file varies each time I run the program so I don't know the number of RDDs before the run.

Code below gives task not serializable error.

val inputFile = sc.textFile(path).persist();
for (dim <- 0 to (numberOfColumns - 2)){            
  val temp = inputFile.map(line => {
    val lines = line.split(',')
    (lines(dim), lines(numberOfColumns - 1))
  })
}

I appreciate any help for solving this issue.

Hoori M. · Accepted Answer

In my code, I had references to the global fields of the class. So Spark had to send the whole class instance to the executors to access those fields and my class was not serializable.

I copied all the global fields as local variables in my method so just the local variables were sent to the executors and the problem got solved.

Mapping an input file to different RDDs

Answers (2)

Related Questions