Vince.Bdn
Vince.Bdn

Reputation: 1175

Scala - why Double consume less memory than Floats in this case?

Here's a strange behavior I fell into and I can't find any hint on why it's like this. I use in this example the estimate method of SizeEstimator from Spark but I haven't found any glitch in their code so I wonder why - if they provide a good estimation of memory - why I have this:

val buf1 = new ArrayBuffer[(Int,Double)]
var i = 0
while (i < 3) {
   buf1 += ((i,i.toDouble))
   i += 1
}
System.out.println(s"Raw size with doubles: ${SizeEstimator.estimate(buf1)}")
val ite1 = buf1.toIterator
var size1: Long = 0l
while (ite1.hasNext) {
   val cur = ite1.next()
   size1 += SizeEstimator.estimate(cur)
}
System.out.println(s"Size with doubles: $size1")

val buf2 = new ArrayBuffer[(Int,Float)]
i = 0
while (i < 3) {
   buf2 += ((i,i.toFloat))
   i += 1
}
System.out.println(s"Raw size with floats: ${SizeEstimator.estimate(buf2)}")
val ite2 = buf2.toIterator
var size2: Long = 0l
while (ite2.hasNext) {
   val cur = ite2.next()
   size2 += SizeEstimator.estimate(cur)
 }
 System.out.println(s"Size with floats: $size2")

The console output prints:

Raw size with doubles: 200
Size with doubles: 96
Raw size with floats: 272
Size with floats: 168

So my question's quite naive: why do floats tend to take more memory than doubles in this case? And why does it get even worse when I transform it into an iterator (first case, there's a 75% ratio which becomes a 50% ratio when transforming into an iterator!).

(To have more context, I fell into this when trying to "optimize" a Spark application by changing Double to Float and found out that it actually took more memory having floats than doubles...)

P.S.: it's not due to the small size of buffers (here 3), if I put 100 instead I get:

Raw size with doubles: 3752
Size with doubles: 3200
Raw size with floats: 6152
Size with floats: 5600

and floats still consume more memory... But the ratio have stabilized, so it seems that the different ratios in transformation to iterator must be due to some overhead I guess.

EDIT: It seems that Product2 is actually only specialized on Int, Long and Double:

trait Product2[@specialized(Int, Long, Double) +T1, @specialized(Int, Long, Double) +T2] extends Any with Product

Do anyone know why Float is not taken into account? Neither Short which leads to weird behaviors...

Upvotes: 11

Views: 750

Answers (1)

Odomontois
Odomontois

Reputation: 16308

This is because Tuple2 is @specialized for Double but not specialized for Float.

That means (Int,Double) will be presented as structure with 2 fields of primitive java types int and double, while (Int,Float) will be presented as structure with int and wrapper type java.lang.Float fields

More discussion here

Upvotes: 13

Related Questions