Charles Welton
Charles Welton

Reputation: 861

Mutating data with immutable data structures

I would like to implement a particular algorithm, but I'm having trouble finding a good data structure for the job. A simpler version of the algorithm works like the following:

Input: A set of points.
Output: A new set of points.
Step 1: For each point, calculate the closest points in a radius.
Step 2: For each point, calculate a value "v" from the closest points subset.
Step 3: For each point, calculate a new value "w" from the closest points and
        the values "v" from the previous step, i.e, "w" depends on the neighbors
        and "v" of each neighbor.
Step 4: Update points.

In C++, I can solve this like this:

struct Point {
    Vector position;
    double v, w;
    std::vector<Point *> neighbors;
};

std::vector<Point> points = initializePoints();
calculateNeighbors(points);
calculateV(points); // points[0].v = value; for example.
calculateW(points);

With a naive structure such as a list of points, I cannot update the value "v" into the original set of points, and would need to calculate the neighbors twice. How can I avoid this and keep the functions pure, since calculating the neighbors is the most expensive part of the algorithm (over 30% of the time)?

PS.: For those experienced in numerical methods and CFD, this is a simplified version of the Smoothed Particle Hydrodynamics method.

Update: Changed step 3 so it is clearer.

Upvotes: 2

Views: 1279

Answers (3)

Daniel Wagner
Daniel Wagner

Reputation: 153212

It is a common myth that Haskell doesn't offer mutation at all. In reality, it offers a very special kind of mutation: a value can mutate exactly once, from un-evaluated to evaluated. The art of taking advantage of this special kind of mutation is called tying the knot. We will start with a data structure just like your one from C++:

data Vector -- held abstract

data Point = Point
    { position  :: Vector
    , v, w      :: Double
    , neighbors :: [Point]
    }

Now, what we're going to do is build an Array Point whose neighbors contain pointers to other elements within the same array. The key features of Array in the following code are that it's spine-lazy (it doesn't force its elements too soon) and has fast random-access; you can substitute your favorite alternate data structure with these properties if you prefer.

There's lots of choices for the interface of the neighbor-finding function. For concreteness and to make my own job simple, I will assume you have a function that takes a Vector and a list of Vectors and gives the indices of neighbors.

findNeighbors :: Vector -> [Vector] -> [Int]
findNeighbors = undefined

Let's also put in place some types for computeV and computeW. For the nonce, we will ask that computeV live up to the informal contract you stated, namely, that it can look at the position and neighbors fields of any Point, but not the v or w fields. (Similarly, computeW may look at anything but the w fields of any Point it can get its hands on.) It is actually possible to enforce this at the type level without too many gymnastics, but for now let's skip that.

computeV, computeW :: Point -> Double
(computeV, computeW) = undefined

Now we are ready to build our (labeled) in-memory graph.

buildGraph :: [Vector] -> Array Int Point
buildGraph vs = answer where
    answer = listArray (0, length vs-1) [point pos | pos <- vs]
    point pos = this where
        this = Point
            { position = pos
            , v = computeV this
            , w = computeW this
            , neighbors = map (answer!) (findNeighbors pos vs)
            }

And that's it, really. Now you can write your

newPositions :: Point -> [Vector]
newPositions = undefined

where newPositions is perfectly free to inspect any of the fields of the Point it's handed, and put all the functions together:

update :: [Vector] -> [Vector]
update = newPositions <=< elems . buildGraph

edit: ...to explain the "special kind of mutation" comment at the beginning: during evaluation, you can expect when you demand the w field of a Point that things will happen in this order: computeW will force the v field; then computeV will force the neighbors field; then the neighbors field will mutate from unevaluated to evaluated; then the v field will mutate from unevaluated to evaluated; then the w field will mutate from unevaluated to evaluated. These last three steps look very similar to the three mutation steps of your C++ algorithm!

double edit: I decided I wanted to see this thing run, so I instantiated all the things held abstract above with dummy implementations. I also wanted to see it evaluate things only once, since I wasn't even sure I'd done it right! So I threw in some trace calls. Here's a complete file:

import Control.Monad
import Data.Array
import Debug.Trace

announce s (Vector pos) = trace $ "computing " ++ s ++ " for position " ++ show pos

data Vector = Vector Double deriving Show

data Point = Point
    { position  :: Vector
    , v, w      :: Double
    , neighbors :: [Point]
    }

findNeighbors :: Vector -> [Vector] -> [Int]
findNeighbors (Vector n) vs = [i | (i, Vector n') <- zip [0..] vs, abs (n - n') < 1]

computeV, computeW :: Point -> Double
computeV (Point pos _ _ neighbors) = sum [n | Point { position = Vector n } <- neighbors]
computeW (Point pos v _ neighbors) = sum [v | Point { v = v } <- neighbors]

buildGraph :: [Vector] -> Array Int Point
buildGraph vs = answer where
    answer = listArray (0, length vs-1) [point pos | pos <- vs]
    point pos = this where { this = Point
        { position  = announce "position" pos $ pos
        , v         = announce "v" pos $ computeV this
        , w         = announce "w" pos $ computeW this
        , neighbors = announce "neighbors" pos $ map (answer!) (findNeighbors pos vs)
        } }

newPositions :: Point -> [Vector]
newPositions (Point { position = Vector n, v = v, w = w }) = [Vector (n*v), Vector w]

update :: [Vector] -> [Vector]
update = newPositions <=< elems . buildGraph

and a run in ghci:

*Main> length . show . update . map Vector $ [0, 0.25, 0.75, 1.25, 35]
computing position for position 0.0
computing v for position 0.0
computing neighbors for position 0.0
computing position for position 0.25
computing position for position 0.75
computing w for position 0.0
computing v for position 0.25
computing neighbors for position 0.25
computing v for position 0.75
computing neighbors for position 0.75
computing position for position 1.25
computing w for position 0.25
computing w for position 0.75
computing v for position 1.25
computing neighbors for position 1.25
computing w for position 1.25
computing position for position 35.0
computing v for position 35.0
computing neighbors for position 35.0
computing w for position 35.0
123

As you can see, each field is computed at most once for each position.

Upvotes: 6

Kostia R
Kostia R

Reputation: 2565

I think you should either use Map (HashMap) to separately store v's (and w's) counted from your Point's, or use mutable variables to reflect your C++ algorithm. First method is more "functional", e.g. you may easily add parralelism into it, since all data is immutable, but it should be little slower, since you'll have to count hash each time you need to get v by point.

Upvotes: 0

Chris Taylor
Chris Taylor

Reputation: 47402

Can you do something like this? Given the following type signatures

calculateNeighbours :: [Point] -> [[Point]]

calculateV :: [Point] -> Double

calculateW :: [Point] -> Double -> Double

you can write

algorithm :: [Point] -> [(Point, Double, Double)]
algorithm pts =                             -- pts  :: [Point]
    let nbrs = calculateNeighbours pts      -- nbrs :: [[Point]]
        vs   = map calculateV nbrs          -- vs   :: [Double]
        ws   = zipWith calculateW nbrs vs   -- ws   :: [Double]
     in zip3 pts vs ws                      --      :: [(Point,Double,Double)]

This calculates the lists of neighbours only once, and re-uses the value in the computations for v and w.

If this isn't what you want, can you elaborate a little more?

Upvotes: 3

Related Questions