Reputation: 75
As part of a larger problem, I am trying to define an array inside an array like so:
import Data.Array.Repa
type Arr = Array DIM2 Int
arr = force $ fromList (Z :. 5 :. 5) [1..25] :: Arr
combined :: Arr
combined = arr `deepSeqArray`
traverse arr (\_ -> Z :. 4 :. 4 :: DIM2) (\f (Z :. x :. y) ->
let reg = force $ extract f (x,y) (2,2)
in reg `deepSeqArray` sumAll reg)
extract :: (DIM2 -> Int) -> (Int,Int) -> (Int,Int) -> Arr
extract lookup (x0,y0) (width,height) = fromFunction bounds
$ \sh -> offset lookup sh
where
bounds = Z :. width :. height
offset :: (DIM2 -> Int) -> DIM2 -> Int
offset f (Z :. x :. y) = f (Z :. x + x0 :. y + y0)
main = print combined
The extract
function is using fromFunction
and the lookup function provided to it, but it could also use traverse
and arr ! ...
for the same effect. Despite using force
and deepSeqArray
everywhere as early as possible, the console is filled with the message here, followed by the correct result:
Data.Array.Repa: Performing nested parallel computation sequentially. You've probably called the 'force' function while another instance was already running. This can happen if the second version was suspended due to lazy evaluation. Use 'deepSeqArray' to ensure each array is fully evaluated before you 'force' the next one.
While I haven't constructed a version with lists to compare the speeds, in the larger version performance is suffering.
Is this simply a consequence of nested array definitions and thus I should restructure my program for either the inner or outer definition to be lists? Is my extract
function horrible and the cause of the problems?
The tips from this question were useful to get this far but I haven't yet gone crawling through the compiled code.
Upvotes: 4
Views: 460
Reputation: 896
It's because 'print' implicitly forces the array as well. The inner 'force' and 'sumAll' functions invoke parallel computation, but does 'print', so you have nested parallelism. That fact that this is so non-obvious is a great sadness in the Repa 2 API.
Repa 3 addresses these sort of issues by exporting both sequential and parallel versions of 'force' and 'sumAll' etc. It also adds a tag to the array type to indicate whether the array is delayed or manifest. Repa 3 isn't finished yet, but you could use the head version on http://code.ouroborus.net/repa. It should be out shorty after GHC 7.4 later this year.
Here is a Repa 3 version of your example that runs without giving the warning about nested parallelism. Note that 'force' is now 'compute'.
import Data.Array.Repa
arr :: Array U DIM2 Int
arr = fromListUnboxed (Z :. 5 :. 5) [1..25]
combined :: Array U DIM2 Int
combined
= computeP $ traverse arr (\_ -> Z :. 4 :. 4 :: DIM2)
$ \f (Z :. x :. y) -> sumAllS $ extract f (x,y) (2,2)
extract :: (DIM2 -> Int) -> (Int,Int) -> (Int,Int) -> Array D DIM2 Int
extract lookup (x0,y0) (width,height)
= fromFunction bounds
$ \sh -> offset lookup sh
where
bounds = Z :. width :. height
offset :: (DIM2 -> Int) -> DIM2 -> Int
offset f (Z :. x :. y) = f (Z :. x + x0 :. y + y0)
main = print combined
Upvotes: 2