Shakti
Shakti

Reputation: 2033

Scala - Mutable ListBuffer or Immutable list to choose?

I am writing a simple scala program that will calculate moving average of list of quotes of a defined size say 100. Quotes will be coming at the rate of approx 5-6 quotes per second .

1) Is it good to keep the quotes in a immutable scala list wherein I guess each time a quote comes a new list will be created ? Is it going to take too much unnecessary memory ?

OR

2) Is it good to keep the quotes in a mutable list like ListBuffer wherein I will removing the oldest quote and pushing the new quotes each time a quote comes .

Current code

package com.example.csv

import scala.io.Source
import scala.collection.mutable.ListBuffer

object CsvFileParser {
  val WINDOW_SIZE = 25;
  var quotes = ListBuffer(0.0);
  def main(args: Array[String]) = {
    val src = Source.fromFile("GBP_USD_Week1.csv");
    //drop header and split the comma separated tokens
    val iter = src.getLines().drop(1).map(_.split(","));
    // Sliding window reads ahead // remove it
    val index = 0;

    while(iter.hasNext) {
     processRecord(iter.next) 
    }
    src.close()
  }

  def processRecord(record: Array[String]) = {
    if(quotes.length < WINDOW_SIZE){
      quotes += record(4).toDouble; 
    }else {
      val movingAverage = quotes.sum / quotes.length
      quotes.map(_ + " ").foreach(print)
      println("\nMoving Average " + movingAverage)
      quotes = quotes.tail;
      quotes += record(4).toDouble;

    }
  }

  /*def simpleMovingAverage(values: ListBuffer[Double], period: Int): ListBuffer[Double] = {
      ListBuffer.fill(period - 1)(0.0) ++ (values.sliding(period).map(_.sum).map(_ / period))  
  }*/

}

Upvotes: 1

Views: 2961

Answers (2)

Andreas Neumann
Andreas Neumann

Reputation: 10894

Immutable structures in Scala use a technique called structural sharing. For Lists it's also mentioned in the Scaladocs:

The functional list is characterized by persistence and structural sharing, thus offering considerable performance and space consumption benefits in some scenarios.

So:

  • the immutable List will take up more space, but not much more and will blend in better with other Scla Constructs
  • the mutable Version is prone to concurrency issues, will be a little faster and use a litte bit less space.

As for the code:

  • If you use a mutable Structure you can get rid of the var, keep it if you want to overwrite an immutable one
  • It would be considered good style having a method returning the List instead of manipulating a mutable Structure or having an global var
  • If you process the file like in the code snippet below, you need not care for closing the file, also the file will be processed concurrently

    def processLines(path: String) : Unit = for { line <- Source.fromFile("GBP_USD_Week1.csv").getLines.tail.par record <- line.split(",") } processRecord(record)

Upvotes: 0

axel22
axel22

Reputation: 32335

This depends on whether you keep the items in a reverse order or not. Lists will append elements in constant time at the beginning (:: will not create an entirely new list), while the ListBuffer#+= will create a node at append it at the end of the list.

There should be little or no performance difference, nor memory footprint difference, between using List and ListBuffer -- internally these are the same data-structures. The only question is will you need to reverse the list at the end -- if you will, this will require creating a second list, so it may be slower.

In your case the decision to use list buffers was correct -- you need to remove the first element, and append to the other side of the collection, which is something that ordinary functional lists would not allow you to do efficiently.

However, in your code you call tail on the ListBuffer. This will actually copy the contents of the list buffer rather than giving you a cheap tail (O(WINDOW_SIZE) operation). You should call the quotes.remove(0, 1) to remove the first entry -- this will just mutate the current buffer, an O(1) operation.

For very fast arrivals of quotes, you might consider using a customized data structure -- for list buffer you will pay the cost of boxing. However, in the case that there are 5-6 quotes per second and the WINDOW_SIZE is around 100, you don't have to worry -- even calling tail on the list buffer should be more than acceptable.

Upvotes: 4

Related Questions