Reputation: 1782
When running my program locally on a 16Gb MBP I get the following occurrences:
15/04/10 20:07:50 INFO BlockManagerMaster: Updated info of block rdd_12_3
15/04/10 20:07:50 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
15/04/10 20:07:50 INFO BlockManagerInfo: Added rdd_12_6 in memory on 192.168.1.4:60005 (size: 854.0 KB, free: 682.9 MB)
15/04/10 20:07:50 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 8 non-empty blocks out of 8 blocks
15/04/10 20:07:50 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/04/10 20:07:50 INFO BlockManagerMaster: Updated info of block rdd_12_6
15/04/10 20:07:50 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
15/04/10 20:07:50 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 8 non-empty blocks out of 8 blocks
15/04/10 20:07:50 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/04/10 20:07:50 INFO ExternalAppendOnlyMap: Thread 67 spilling in-memory batch of 7.9 MB to disk (1 times so far)
15/04/10 20:07:50 INFO ExternalAppendOnlyMap: Thread 95 spilling in-memory batch of 5.0 MB to disk (1 times so far)
15/04/10 20:07:50 INFO ExternalAppendOnlyMap: Thread 66 spilling in-memory batch of 8.0 MB to disk (1 times so far)
15/04/10 20:07:50 INFO ExternalAppendOnlyMap: Thread 95 spilling in-memory batch of 5.0 MB to disk (2 timess so far)
15/04/10 20:07:50 INFO ExternalAppendOnlyMap: Thread 65 spilling in-memory batch of 5.8 MB to disk (1 times so far)
15/04/10 20:07:51 INFO ExternalAppendOnlyMap: Thread 67 spilling in-memory batch of 5.2 MB to disk (2 timess so far)
15/04/10 20:07:51 INFO ExternalAppendOnlyMap: Thread 66 spilling in-memory batch of 5.6 MB to disk (2 timess so far)
15/04/10 20:07:51 INFO ExternalAppendOnlyMap: Thread 95 spilling in-memory batch of 5.0 MB to disk (3 timess so far)
15/04/10 20:07:51 INFO ExternalAppendOnlyMap: Thread 65 spilling in-memory batch of 5.0 MB to disk (2 timess so far)
15/04/10 20:07:51 INFO ExternalAppendOnlyMap: Thread 61 spilling in-memory batch of 24.3 MB to disk (1 times so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 67 spilling in-memory batch of 5.0 MB to disk (3 timess so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 66 spilling in-memory batch of 5.0 MB to disk (3 timess so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 95 spilling in-memory batch of 5.0 MB to disk (4 timess so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 65 spilling in-memory batch of 5.3 MB to disk (3 timess so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 66 spilling in-memory batch of 5.0 MB to disk (4 timess so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 95 spilling in-memory batch of 5.2 MB to disk (5 timess so far)
15/04/10 20:07:52 INFO ExternalAppendOnlyMap: Thread 67 spilling in-memory batch of 5.8 MB to disk (4 timess so far)
15/04/10 20:07:53 INFO ExternalAppendOnlyMap: Thread 63 spilling in-memory batch of 35.6 MB to disk (1 times so far)
15/04/10 20:07:53 INFO ExternalAppendOnlyMap: Thread 65 spilling in-memory batch of 5.0 MB to disk (4 timess so far)
15/04/10 20:07:53 INFO ExternalAppendOnlyMap: Thread 66 spilling in-memory batch of 5.0 MB to disk (5 timess so far)
15/04/10 20:07:53 INFO ExternalAppendOnlyMap: Thread 95 spilling in-memory batch of 5.0 MB to disk (6 timess so far)
15/04/10 20:07:53 INFO MemoryStore: ensureFreeSpace(872616) called with curMem=1345765155, maxMem=2061647216
15/04/10 20:07:53 INFO MemoryStore: Block rdd_12_2 stored as values in memory (estimated size 852.2 KB, free 681.9 MB)
15/04/10 20:07:53 INFO BlockManagerInfo: Added rdd_12_2 in memory on 192.168.1.4:60005 (size: 852.2 KB, free: 682.0 MB)
15/04/10 20:07:53 INFO BlockManagerMaster: Updated info of block rdd_12_2
My understanding is, is it has free memory, most of the memory is free in fact; given by:
15/04/10 20:07:50 INFO BlockManagerInfo: Added rdd_12_6 in memory on 192.168.1.4:60005 (size: 854.0 KB, free: 682.9 MB)
And yet it is spilling to disk? I'm using a ~265Mb dataset, so it really shouldn't need to be spilled to disk?
For what it's worth:
15/04/10 20:06:50 INFO MemoryStore: MemoryStore started with capacity 1966.1 MB
With all this spilling to disk it's taking ~5 minutes to run through my program once.
Why is this occurring?
Upvotes: 3
Views: 11784
Reputation: 429
I found that one of my columns had nulls
throughout causing a skew which resulted in constant spills.
Upvotes: 3
Reputation: 27455
There are different memory arenas in play. For caching Spark uses spark.storage.memoryFraction
(defaults to 60%) of the heap. This is what most of the "free memory" messages are about. It uses spark.shuffle.memoryFraction
(defaults to 20%) of the heap for shuffle. I think this is what the spill messages are about. You can disable shuffle spill entirely by setting spark.shuffle.spill
to false
(defaults to true
).
I don't know if this explains all of what you are seeing. See http://spark.apache.org/docs/latest/configuration.html for the description of all such parameters.
Upvotes: 1