How to write to a csv file in scala?

Question

I am trying to write data to a csv file, I have four columns which I have created as

val csvFields = Array("Serial Number", "Record Type", First File value", Second file value") ',

other than serial number other three fields are lists

Second_file_value = List ("B", "gjgbn", "fgbhjf", "dfjf")

First_File_Value = List ("A","abhc","agch","mknk")

Record_type = List('1','2',3','4');

 val outputFile = new BufferedWriter(new FileWriter("Resulet.csv")
 val csvWriter = new CSVWriter(outputFile)
 val listOfRecords = new ListBuffer[Array[String]]()
 listOfRecords :+ csvFields

I am using this loop for writing into columns

for ( i <- 1 until 30){
listOfRecords += Array(i.toString, Record_type , First_File_Value , Second_file_value )}
csvWriter.writeAll(listOfRecords.toList)
output.close()

The problem I am facing is the csv file is filled with 30 rows of same values(1st row value), the values in the lists are not getting iterated.

Any references will also be helpful

isomarcte · Accepted Answer

Without a complete example (as in a compiling Main file), it can't be said why you are getting the same row over and over. The snippet you posted is correct in isolation.

scala> val lb: ListBuffer[Array[String]] = new ListBuffer[Array[String]]()
lb: scala.collection.mutable.ListBuffer[Array[String]] = ListBuffer()

scala> for (i <- 1 until 30){lb += Array(i.toString)}

scala> lb.toList
res5: List[Array[String]] = List(Array(1), Array(2), Array(3), Array(4), Array(5), Array(6), Array(7), Array(8), Array(9), Array(10), Array(11), Array(12), Array(13), Array(14), Array(15), Array(16), Array(17), Array(18), Array(19), Array(20), Array(21), Array(22), Array(23), Array(24), Array(25), Array(26), Array(27), Array(28), Array(29))

However, there are a number of ways you can do this better in general that might help you avoid this and other bugs.

Adding A Serial Prefix To All Rows

In Scala it is general considered better to prefer immutable structures over mutable ones as an idiom. Given that, I'd suggest you construct a function to add the serial prefix to your rows using an immutable method. There are a number of ways to do this, but the most fundamental one is a fold operation. If you are not familiar with it, a fold can be thought of as a transformation over a structure, like the functional version of a for loop.

With that in mind, here is how you might take some rows, which are a List[List[String]] and add a numeric prefix to all of them.

def addPrefix(lls: List[List[String]]): List[List[String]] =
  lls.foldLeft((1, List.empty[List[String]])){
    // You don't need to annotate the types here, I just did that for clarity.
    case ((serial: Int, acc: List[List[String]]), value: List[String]) =>
      (serial + 1, (serial.toString +: value) +: acc)
  }._2.reverse

A foldLeft builds up the list in the reverse of what we want, which is why I call .reverse at the end. The reason for this is an artifact of how the stacks work when traversing structures and is beyond the scope of this question, but there are many good articles on why to use foldLeft or foldRight.

From what I read above, this is what your rows look like in the example.

val columnOne: List[String] =
  List('1','2','3','4').map(_.toString)
val columnTwo: List[String] =
  List("A","abhc","agch","mknk")
val columnThree: List[String] =
  List("B", "gjgbn", "fgbhjf", "dfjf")

val rows: List[List[String]] =
  columnOne.zip(columnTwo.zip(columnThree)).foldLeft(List.empty[List[String]]){
    case (acc, (a, (b, c))) => List(a, b, c) +: acc
  }.reverse

Which yields this.

scala> rows.foreach(println)
List(1, A, B)
List(2, abhc, gjgbn)
List(3, agch, fgbhjf)
List(4, mknk, dfjf)

Let's try calling our function with that as the input.

scala> addPrefix(rows).foreach(println)
List(1, 1, A, B)
List(2, 2, abhc, gjgbn)
List(3, 3, agch, fgbhjf)
List(4, 4, mknk, dfjf)

Okay, that looks good.

Writing The CSV File

Now to write the CSV file. Because CSVWriter works in terms of Java collection types, we need to convert our Scala types to Java collections. In Scala you should do this at the last possible moment. The reason for this is that Scala's types are designed to work well with Scala and we don't want to lose that ability early. They are also safer than the parallel Java types in terms of immutability (if you are using the immutable variants, which this example does).

Let's define a function writeCsvFile that takes a filename, a header row, and a list of rows and writes it out. Again there are many ways to do this correctly, but here is a simple example.

def writeCsvFile(
  fileName: String,
  header: List[String],
  rows: List[List[String]]
): Try[Unit] =
  Try(new CSVWriter(new BufferedWriter(new FileWriter(fileName)))).flatMap((csvWriter: CSVWriter) =>
    Try{
      csvWriter.writeAll(
        (header +: rows).map(_.toArray).asJava
      )
      csvWriter.close()
    } match {
      case f @ Failure(_) =>
        // Always return the original failure.  In production code we might
        // define a new exception which wraps both exceptions in the case
        // they both fail, but that is omitted here.
        Try(csvWriter.close()).recoverWith{
          case _ => f
        }
      case success =>
        success
    }
  )

Let's break that down for a moment. I am using the Try data type from the scala.util package. It is similar to the language level try/catch/finally blocks, but rather than using a special construct to catch exceptions, it uses a normal value. This is another common idiom in Scala, prefer plain language values over special language control flow constructs.

Let's take a closer look at this expression (header +: rows).map(_.toArray).asJava. This small expression is doing quite a few operations. First, we add our header row into the front of our list of rows (header +: rows). Then, since the CSVWriter wants an Iterable> we first convert the inner type to Array then the outer type to Iterable. The .asJava call is what does the outer type conversion and you get it by importing scala.collection.JavaConverters._ which has implicit conversions between Scala and Java types.

The rest of the function is pretty straight forward. We write the rows out, then check if there was a failure. If there was, we ensure that we still attempt to close the CSVWriter.

Full Compiling Example

I've included a full compiling example here.

import com.opencsv._
import java.io._
import scala.collection.JavaConverters._
import scala.util._

object Main {

  val header: List[String] =
    List("Serial Number", "Record Type", "First File value", "Second file value")

  val columnOne: List[String] =
    List('1','2','3','4').map(_.toString)
  val columnTwo: List[String] =
    List("A","abhc","agch","mknk")
  val columnThree: List[String] =
    List("B", "gjgbn", "fgbhjf", "dfjf")

  val rows: List[List[String]] =
    columnOne.zip(columnTwo.zip(columnThree)).foldLeft(List.empty[List[String]]){
      case (acc, (a, (b, c))) => List(a, b, c) +: acc
    }.reverse

  def addPrefix(lls: List[List[String]]): List[List[String]] =
    lls.foldLeft((1, List.empty[List[String]])){
      case ((serial: Int, acc: List[List[String]]), value: List[String]) =>
        (serial + 1, (serial.toString +: value) +: acc)
    }._2.reverse

  def writeCsvFile(
    fileName: String,
    header: List[String],
    rows: List[List[String]]
  ): Try[Unit] =
    Try(new CSVWriter(new BufferedWriter(new FileWriter(fileName)))).flatMap((csvWriter: CSVWriter) =>
      Try{
        csvWriter.writeAll(
          (header +: rows).map(_.toArray).asJava
        )
        csvWriter.close()
      } match {
        case f @ Failure(_) =>
          // Always return the original failure.  In production code we might
          // define a new exception which wraps both exceptions in the case
          // they both fail, but that is omitted here.
          Try(csvWriter.close()).recoverWith{
            case _ => f
          }
        case success =>
          success
      }
    )

  def main(args: Array[String]): Unit = {
    println(writeCsvFile("/tmp/test.csv", header, addPrefix(rows)))
  }
}

Here is the contents of the file after running that program.

"Serial Number","Record Type","First File value","Second file value"
"1","1","A","B"
"2","2","abhc","gjgbn"
"3","3","agch","fgbhjf"
"4","4","mknk","dfjf"

Final Notes

Outdated Library

I noticed in the comments on the original post that you were using "au.com.bytecode" % "opencsv" % "2.4". I'm not familiar with the opencsv library in general, but according to Maven Central that appears to be a very old fork of the primary repo. I'd suggest you use the primary repo. https://search.maven.org/search?q=opencsv

Performance

People often get concerned that when using immutable data structures and techniques that we are required to make a performance trade off. This can be the case, but usually the asymptotic complexity is unchanged. The above solution is O(n) where n is the number of rows. It has a higher constant than a mutable solution, but generally that is not significant. If it were, there are techniques that could be employed, such as more explicit recursion in addPrefix that would mitigate this. However, you should never optimize like that unless you really need to, as it makes the code more error prone and less idiomatic (and thus less readable).