Vlad
Vlad

Reputation: 33

Writing Files take a lot of time

I am writing three List of TripleInts with 277270 rows aprox, My class TripleInts is the following:

class tripleInt  (var sub:Int, var pre:Int, var obj:Int)

Additional I create each lists with Apache Jena components from an RDF file, I transform the RDF elements to ids and I store this ids in the diferent lists. Once I have the lists, I write the files with the following code:

class Indexes (val listSPO:List[tripleInt], val listPSO:List[tripleInt], val listOSP:List[tripleInt] ){
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  val pl = listPSO.sortBy(l => (l.sub, l.pre))
  //val ol = listOSP.sortBy(l => (l.sub, l.pre))

  var y1:Int=0
  var y2:Int=0
  var y3:Int=0

  val fstream:FileWriter = new FileWriter("patSPO.dat")
  var out:BufferedWriter = new BufferedWriter(fstream)
  //val fstream:FileOutputStream = new FileOutputStream("patSPO.dat")
  //var out:ObjectOutputStream = new ObjectOutputStream(fstream)
  //out.writeObject(listSPO)
  val fstream2:FileWriter = new FileWriter("patPSO.dat")
  var out2:BufferedWriter = new BufferedWriter(fstream2)
  /*val fstream3:FileOutputStream = new FileOutputStream("patOSP.dat")
  var out3:BufferedOutputStream = new BufferedOutputStream(fstream3)*/

  for ( a <- 0 to sl.size-1){
    y1 = sl(a).sub
    y2 = sl(a).pre
    y3 = sl(a).obj
    out.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
  }
  for ( a <- 0 to pl.size-1){
    y1 = pl(a).sub
    y2 = pl(a).pre
    y3 = pl(a).obj
    out2.write((y1.toString+","+y2.toString+","+y3.toString+"\n"))
  }
  out.close()
  out2.close()

This process takes 30 min aprox. My pc is 16 Gb Ram, core i7. Then I don't understand why is taking a lot of time, and Is there a way to optimize this performance?

Thank you

Upvotes: 0

Views: 53

Answers (1)

G&#225;bor Bakos
G&#225;bor Bakos

Reputation: 9100

Yes, you need to choose your data structures wisely. List is for sequential access (Seq), not random access (IndexedSeq). What you are doing is O(n^2) because of indexing large Lists. The following should be much faster (O(n), and hopefully easier to read too):

class Indexes (val listSPO: List[tripleInt], val listPSO: List[tripleInt], val listOSP: List[tripleInt] ){
  val sl = listSPO.sortBy(l => (l.sub, l.pre))
  val pl = listPSO.sortBy(l => (l.sub, l.pre))

  var y1:Int=0
  var y2:Int=0
  var y3:Int=0

  val fstream:FileWriter = new FileWriter("patSPO.dat")
  val out:BufferedWriter = new BufferedWriter(fstream)

  for (s <- sl){
    y1 = s.sub
    y2 = s.pre
    y3 = s.obj
    out.write(s"$y1,$y2,$y3\n"))
  }
  // TODO close in finally
  out.close()

  val fstream2:FileWriter = new FileWriter("patPSO.dat")
  val out2:BufferedWriter = new BufferedWriter(fstream2)

  for ( p <- pl){
    y1 = p.sub
    y2 = p.pre
    y3 = p.obj
    out2.write(s"$y1,$y2,$y3\n"))
  }
  // TODO close in finally
  out2.close()
}

(It would not hurt using IndexedSeq/Vector as inputs, but there might be constraints why List is preferred in your case.)

Upvotes: 1

Related Questions