Reputation: 861
I would like to implement a particular algorithm, but I'm having trouble finding a good data structure for the job. A simpler version of the algorithm works like the following:
Input: A set of points.
Output: A new set of points.
Step 1: For each point, calculate the closest points in a radius.
Step 2: For each point, calculate a value "v" from the closest points subset.
Step 3: For each point, calculate a new value "w" from the closest points and
the values "v" from the previous step, i.e, "w" depends on the neighbors
and "v" of each neighbor.
Step 4: Update points.
In C++, I can solve this like this:
struct Point {
Vector position;
double v, w;
std::vector<Point *> neighbors;
};
std::vector<Point> points = initializePoints();
calculateNeighbors(points);
calculateV(points); // points[0].v = value; for example.
calculateW(points);
With a naive structure such as a list of points, I cannot update the value "v" into the original set of points, and would need to calculate the neighbors twice. How can I avoid this and keep the functions pure, since calculating the neighbors is the most expensive part of the algorithm (over 30% of the time)?
PS.: For those experienced in numerical methods and CFD, this is a simplified version of the Smoothed Particle Hydrodynamics method.
Update: Changed step 3 so it is clearer.
Upvotes: 2
Views: 1279
Reputation: 153212
It is a common myth that Haskell doesn't offer mutation at all. In reality, it offers a very special kind of mutation: a value can mutate exactly once, from un-evaluated to evaluated. The art of taking advantage of this special kind of mutation is called tying the knot. We will start with a data structure just like your one from C++:
data Vector -- held abstract
data Point = Point
{ position :: Vector
, v, w :: Double
, neighbors :: [Point]
}
Now, what we're going to do is build an Array Point
whose neighbors
contain pointers to other elements within the same array. The key features of Array
in the following code are that it's spine-lazy (it doesn't force its elements too soon) and has fast random-access; you can substitute your favorite alternate data structure with these properties if you prefer.
There's lots of choices for the interface of the neighbor-finding function. For concreteness and to make my own job simple, I will assume you have a function that takes a Vector
and a list of Vectors
and gives the indices of neighbors.
findNeighbors :: Vector -> [Vector] -> [Int]
findNeighbors = undefined
Let's also put in place some types for computeV
and computeW
. For the nonce, we will ask that computeV
live up to the informal contract you stated, namely, that it can look at the position
and neighbors
fields of any Point
, but not the v
or w
fields. (Similarly, computeW
may look at anything but the w
fields of any Point
it can get its hands on.) It is actually possible to enforce this at the type level without too many gymnastics, but for now let's skip that.
computeV, computeW :: Point -> Double
(computeV, computeW) = undefined
Now we are ready to build our (labeled) in-memory graph.
buildGraph :: [Vector] -> Array Int Point
buildGraph vs = answer where
answer = listArray (0, length vs-1) [point pos | pos <- vs]
point pos = this where
this = Point
{ position = pos
, v = computeV this
, w = computeW this
, neighbors = map (answer!) (findNeighbors pos vs)
}
And that's it, really. Now you can write your
newPositions :: Point -> [Vector]
newPositions = undefined
where newPositions
is perfectly free to inspect any of the fields of the Point
it's handed, and put all the functions together:
update :: [Vector] -> [Vector]
update = newPositions <=< elems . buildGraph
edit: ...to explain the "special kind of mutation" comment at the beginning: during evaluation, you can expect when you demand the w
field of a Point
that things will happen in this order: computeW
will force the v
field; then computeV
will force the neighbors
field; then the neighbors
field will mutate from unevaluated to evaluated; then the v
field will mutate from unevaluated to evaluated; then the w
field will mutate from unevaluated to evaluated. These last three steps look very similar to the three mutation steps of your C++ algorithm!
double edit: I decided I wanted to see this thing run, so I instantiated all the things held abstract above with dummy implementations. I also wanted to see it evaluate things only once, since I wasn't even sure I'd done it right! So I threw in some trace
calls. Here's a complete file:
import Control.Monad
import Data.Array
import Debug.Trace
announce s (Vector pos) = trace $ "computing " ++ s ++ " for position " ++ show pos
data Vector = Vector Double deriving Show
data Point = Point
{ position :: Vector
, v, w :: Double
, neighbors :: [Point]
}
findNeighbors :: Vector -> [Vector] -> [Int]
findNeighbors (Vector n) vs = [i | (i, Vector n') <- zip [0..] vs, abs (n - n') < 1]
computeV, computeW :: Point -> Double
computeV (Point pos _ _ neighbors) = sum [n | Point { position = Vector n } <- neighbors]
computeW (Point pos v _ neighbors) = sum [v | Point { v = v } <- neighbors]
buildGraph :: [Vector] -> Array Int Point
buildGraph vs = answer where
answer = listArray (0, length vs-1) [point pos | pos <- vs]
point pos = this where { this = Point
{ position = announce "position" pos $ pos
, v = announce "v" pos $ computeV this
, w = announce "w" pos $ computeW this
, neighbors = announce "neighbors" pos $ map (answer!) (findNeighbors pos vs)
} }
newPositions :: Point -> [Vector]
newPositions (Point { position = Vector n, v = v, w = w }) = [Vector (n*v), Vector w]
update :: [Vector] -> [Vector]
update = newPositions <=< elems . buildGraph
and a run in ghci:
*Main> length . show . update . map Vector $ [0, 0.25, 0.75, 1.25, 35]
computing position for position 0.0
computing v for position 0.0
computing neighbors for position 0.0
computing position for position 0.25
computing position for position 0.75
computing w for position 0.0
computing v for position 0.25
computing neighbors for position 0.25
computing v for position 0.75
computing neighbors for position 0.75
computing position for position 1.25
computing w for position 0.25
computing w for position 0.75
computing v for position 1.25
computing neighbors for position 1.25
computing w for position 1.25
computing position for position 35.0
computing v for position 35.0
computing neighbors for position 35.0
computing w for position 35.0
123
As you can see, each field is computed at most once for each position.
Upvotes: 6
Reputation: 2565
I think you should either use Map (HashMap) to separately store v's (and w's) counted from your Point
's, or use mutable variables to reflect your C++ algorithm. First method is more "functional", e.g. you may easily add parralelism into it, since all data is immutable, but it should be little slower, since you'll have to count hash each time you need to get v by point.
Upvotes: 0
Reputation: 47402
Can you do something like this? Given the following type signatures
calculateNeighbours :: [Point] -> [[Point]]
calculateV :: [Point] -> Double
calculateW :: [Point] -> Double -> Double
you can write
algorithm :: [Point] -> [(Point, Double, Double)]
algorithm pts = -- pts :: [Point]
let nbrs = calculateNeighbours pts -- nbrs :: [[Point]]
vs = map calculateV nbrs -- vs :: [Double]
ws = zipWith calculateW nbrs vs -- ws :: [Double]
in zip3 pts vs ws -- :: [(Point,Double,Double)]
This calculates the lists of neighbours only once, and re-uses the value in the computations for v
and w
.
If this isn't what you want, can you elaborate a little more?
Upvotes: 3