Reputation: 92805
I've written a simple physics modeler that allows me to bounce balls around the screen. You can click and drag to launch a ball, or you can generate balls hundreds at a time and watch them interact with each other.
It has been a fun little program to work on and I want to take it further if I can. I know they say pre-mature optimization is the root of all evil but I'm starting to hit actual performance barriers and I want to know if anyone who is experienced in game/simulator development can help me out.
Problem:
Right now, my program chokes if you add too many balls (it can't seem to handle more than 800 on my machine). If you do, the simulation is no longer realistic and all the balls overlap each other on the bottom.
The problem is in collision detection. In the most naive case, collision detection is an O(N^2) problem. Each ball checks every other ball. This gets poor performance pretty quickly (after even 100 balls you'll be doing 10k collision checks per loop cycle).
If you look here, you can see a screenshot of where I have added several hundred balls. The simulator can't keep up and they begin to overlap each other.
Currently, I detect collisions by looking for overlapping balls. If I find two balls that are overlapping, I separate them by their minimum translation distance (MTD), or push them apart. I then use a simple physics equation to adjust their impulse vectors and off they go in different directions after the collision.
It works great except if there are too many balls the minimum translation distance becomes noticeable. They begin overlapping by large amounts and constantly jostle each other on the bottom. It gets worse the more I increase "gravity". The pressure on them is increased and the amount they are compressed/overlap each other increases.
Again, I have no issues until I hit a sizable N number of balls.
Current Optimization Method:
Collision Detection -
Sweep and prune - (aka. Sort and Sweep)
I use an insertion sort on my balls each loop cycle along their x axis. Because of the nature of Insertion Sort I can exploit the temporal coherence of my simulator. Frame to frame, the balls positions change only slightly so the sort doesn't have much work to do. This brings Linear Sorts amortized runtime to O(N) or linear rather than its average runtime of O(N^2).
Since the balls are sorted, I do a couple pre-checks in my second loop before checking collisions. Now I only have to check the balls near each other (because they are sorted along the x-axis), and I break out of the second loop anytime I check a ball against another ball whose xmin is greater than the xmax of the current ball. So it skips thousands of checks.
When I implemented this, it brought a huge speed improvement. However, I would still like to be able to handle more than 600-800 balls. I've read about Physics Engines that easily handle collision detection between 10k objects simultaneously, so I would like to think I could reach 1-2k with a little work.
After running a profiler it came out that Collision Detection was eating up about 55% of my time while rendering was eating up about 45%. So, these are my two most expensive costs.
Question:
Can you think of any better algorithms or techniques that would allow my simulator to be able to handle more balls?
Relevant Code:
Entire Project:
svn checkout http://simucal-projects.googlecode.com/svn/ballbounce/trunk/
or, click here to browse the files manually in your browser.
Sections of Interest:
Upvotes: 35
Views: 4704
Reputation: 281
Simucal. I realize that I'm answering about a year and a half too late, but I'd still like to write my thoughts down.
I recently wrote a question concerning the same problem as your balls overlapping, which actually used the same algorithm you used. I didn't see your question when I submitted it, so my bad.
First of all,
The optimization: I use a simple grid partitioning system where the whole screen is divided into cells that are the width and height of the largest ball's diameter. The grid keeps track of which cell each ball is in, so when there needs to be a collision check, I use a nearBy() function which fills a list of the IDs of every ball that's in cells adjacent to the one I'm checking.
The grid method works great, and I can have up to 2000 balls before lagging. Removing and adding balls to the grid is a bit of a pain though, but that's just the way I implemented it (the grid based heavily on a ball list and the index position of every ball).
In the future, I'd like to look into other methods of partitioning and optimizing collision checks.
The Overlapping: A lot of the answers here and other places suggest that you recursively correct collisions every frame. This actually worked for me to an extent. With good enough optimization, you could get away with 2 or 3 checks per frame which seems to prevent some of the overlapping mess. It's not perfect though. I suspect that improving the accuracy (with fancy words like interpolation and better integration) can help with the jittering and overlapping.
I thought about ordering the collision checks based on priority (highest being the ones touching the walls, then ones touching those are one step down on the priority list, etc), and factoring that into the minimum translation distance. MyPhysicsLab talks about handling multiple simultaneous collisions, but I've yet to look into that.
If you ever find anything out, please update! I'm working on the exact same problems, and ball simulators seem to be pretty popular. This kind of experience can come in handy if we were to ever move on to rigid body physics.
Thanks.
Upvotes: 1
Reputation: 12342
Unless I've missed something, the balls overlapping are the result of a bug, not inefficient code. Inefficient code would just cause the animation to run slow.
I'd suggest the problem is due to the iterative approach of the ball to ball collision detection. Currently it looks like you're only considering the result for a ball to ball collision. When multiple balls are touching a single ball the results seem to be undefined, and the chances of this happening increase with larger gravity or number of balls.
Upvotes: 0
Reputation: 3413
Try this:
Divide your rectangle up into N*M squares, such that the squares are slightly wider than the radius of a ball. It might be a good idea to have the squares overlap the edges of your rectangle, rather than fit neatly within it.
Make an array of BitSet. Don't use an Bitset[M][N], just new Bitset[M*N] - a little multiplication isn't going to hurt you.
Identify each ball with a number. When you position a ball at a location, set the bit in the bitset for that square and the 8 squares around it (to make this easier, extend your array of squares so that they extend one beyond the edge of the rectangle - that way you don't have to clip.)
Run through the squares. For each pair of balls in each square mark that pair as being a potential collision. To do this, create a bitset and - given that you have H balls and balls A and B occupy the same square - set the bits A+BH and AH+B.
Looing through the potential collisions is now easy, because BitSet includes a method that says "find me the next bit after this one that is set". Remember that each bit is double counted, so when bit Q is detected as being set, be sure to unset bit (Q%H)*H + (Q/H)
- which is the other bit of the pair.
Alternatively: you can collapse that array of collisions pretty easily. A collision between A and B - given that A > B can be marked by setting bit A * (A-1) / 2 + B
. This has the advantage that you don't need to care about the total number of balls.
Actually: forget it. just use this class which I wrote as an exercise:
import java.util.BitSet;
import java.util.Iterator;
import java.util.NoSuchElementException;
public class PairSet extends BitSet implements
Iterable<PairSet.Pair> {
public static class Pair implements Comparable<Pair> {
public final int a;
public final int b;
private Pair(int a, int b) {
if (a < 0 || b < 0 || a == b) { throw new IllegalArgumentException(
"Pair(" + a + "," + b + ")"); }
if (a > b) {
this.a = a;
this.b = b;
} else {
this.a = b;
this.b = a;
}
}
public String toString() {
return "Pair(" + a + "," + b + ")";
}
public int hashCode() {
return a * (a - 1) / 2 + b;
}
public boolean equals(Object o) {
return o instanceof Pair
&& hashCode() == ((Pair) o).hashCode();
}
public int compareTo(Pair o) {
return hashCode() - o.hashCode();
}
}
PairSet() {}
PairSet(BitSet z) {
or(z);
}
PairSet(Iterable<Pair> z) {
for (Pair p : z)
set(p);
}
public void set(Pair p) {
set(p.a, p.b);
}
public void clear(Pair p) {
clear(p.a, p.b);
}
public void set(int a, int b) {
if (a < 0 || b < 0 || a == b) { throw new IllegalArgumentException(
"add(" + a + "," + b + ")"); }
if (a > b) {
set(a * (a - 1) / 2 + b);
} else {
set(b * (b - 1) / 2 + a);
}
}
public void clear(int a, int b) {
if (a < 0 || b < 0 || a == b) { throw new IllegalArgumentException(
"add(" + a + "," + b + ")"); }
if (a > b) {
clear(a * (a - 1) / 2 + b);
} else {
clear(b * (b - 1) / 2 + a);
}
}
public Iterator<Pair> iterator() {
return new Iterator<Pair>() {
int at = -1;
int triangle = 0;
int a = 0;
public boolean hasNext() {
return nextSetBit(at + 1) != -1;
}
public Pair next() {
int nextat = nextSetBit(at + 1);
if (nextat == -1) { throw new NoSuchElementException(); }
at = nextat;
while (triangle <= at) {
triangle += a++;
}
return new Pair(a - 1, at - (triangle - a) - 1);
}
public void remove() {
throw new UnsupportedOperationException();
}
};
}
}
And that will nicely keep track of your potential collisions. The psudeocode is then
SW = width of rectangle
SH = height of rectangle
R = radius of balls + 1 // +1 is a fudge factor.
XS = number of squares across = SW/R + 4; // the +4 adds some slop
YS = number of squares hight = SH/R + 4; // the +4 adds some slop
int sx(Point2D.Float p) // the square into which you put a ball at x
// never returns a number < 1
:= (int)((p.x-R/2)/R) + 2;
int sy(Point2D.Float p) // the square into which you put a ball at y
// never returns a number < 1
:= (int)((p.y-R/2)/R) + 2;
Bitset[] buckets = new BitSet[XS*YS];
{for(int i: 0; i<buckets.length; i++) bukets[i] = new BitSet();}
BitSet bucket(int x, int y) {return bucket[y*XS + x]}
BitSet bucket(Point2D.Float p) {return bucket(sy(p),sx(p));}
void move(int ball, Point2D.Float from, Point2D.Float to) {
if bucket(from) == bucket(to) return;
int x,y;
x = sx(from); y=sy(from);
for(int xx==-1;xx<=1; xx++)
for(int yy==-1;yy<=1; yy++)
bucket(sx+xx, sy+yy).clear(ball);
x = sx(to); y=sy(to);
for(int xx==-1;xx<=1; xx++)
for(int yy==-1;yy<=1; yy++)
bucket(sx+xx, sy+yy).set(ball);
}
PointSet findCollisions() {
PointSet pp = new PointSet();
for(BitSet bb: buckets) {
int a;
int prev_a;
for(prev_a = -1; (a = bb.nextSetBit(prev_a+1))!=-1; prev_a=a) {
int b;
int prev_b;
for(prev_b = a; (b = bb.nextSetBit(prev_b+1))!=-1; prev_b=b) {
pp.add(a,b);
}
}
return pp;
}
Upvotes: 1
Reputation:
Your main problem is your collision resolution algorithm -- you can speed up the drawing and the collision detection, but that won't stop your balls collapsing into each other. The problem you have is much more difficult to solve cleanly than it appears; it is possible to do it right, though.
In your case, in the configuration that fails (big bin-o'-balls) your objects should really be sliding against each other, not bouncing. Sliding is a continuous constraint, which is different from the kind of impulse-based constraints your implementation handles.
The simplest thing you can do to prevent the collapse is to recheck all colliding objects to make sure they aren't interpenetrating after you do the collision handling -- potentially iterating as many times as necessary until none of them violate the constraints. This will of course take longer -- possibly much longer (and even raises the possibility of an infinite loop, though maybe not in that particular situation...).
There are good algorithms that will make your simulation behave, but they are more complicated than your impulse-based system. Generally, they involve considering mutually interacting bodies (like the balls in the bin) as a whole, and adjusting their collective configuration to satisfy their mutual constraints.
You should look for works (books, papers, web sites) about multibody simulation. I can't say which is likely to be most useful for your purposes; if I were you, I would start by visiting a good university library and looking through whatever books they have on the subject. You should be prepared for some serious math; if terms like "Lagrange multipliers" make you break out in hives, look out 8^). Basically, if you go this route, you will likely learn a lot of math, and no small amount of physics.
Upvotes: 4
Reputation: 1213
I've done something very much like this on the iPhone, and it uses the accelerometer to allow you to tilt the balls around, and the touch screen to add and delete balls. It can handle at least 30 balls before it starts to bog down noticeably.
One optimization I did early on is to inline the math. Originally I had a separate "vector" class, and it could only handle 10-12 balls before it turned into a slide show. Profiling showed it was spending a lot of time allocating and deallocating vectors.
Also, I don't separate the balls once they hit, I just bounce the vectors. They don't seem to ever overlap, and when they do it's pretty obvious that it's because they all got jammed together on the bottom.
I'm not quite ready to release the code yet, I've got some polishing to do, then I'll put it on the store.
Upvotes: 0
Reputation: 15474
I've been following the development of Chipmunk since it began and from my understanding the answer to optimizing is in the data structures. (isn't it always?)...
The data structure Chipmunk uses to achieve this is a Spacial Hash.
Upvotes: 2
Reputation:
Once a ball is completely surrounded by other balls stop considering it for collision detection. Just from looking at your screenshot it seems that only the "surface" balls should be considered. Why check the ball 6 balls deep that nothing can possibly collide with? This would greatly reduce the number of potential collisions.
Upvotes: 4
Reputation: 7819
The simple way is don't test Object vs Object collisions, populate an array with the center point of each ball. Then, from each center scan a a square of size 4*radius centered on that point (you can optimize this a bit by not using a square, at the expense of making the code more complex). If there's another center in this square, only then do you check to see if they're within 2*radius of each other (and therefore colliding). You can further optimize this by reducing the resolution, and rounding the ball position, reducing the number of checks you need to do.
Another way, is to divide your space into a grid, and store the objects in grid regions. You only need to check for collisions between objects in adjacent grids.
Upvotes: 15
Reputation: 69855
The hard core physics engines use vectorization of floats, which gives a x16 boost on current hardware if lucky, and way more on specialized hardware. Larrabee for example can handle 1024 simultaneous calculations for a theretocal x1024 boost in math processing (but it needs this because it's also GPU)
Haven't looked at the code yet, but are you using math optimizations like fast square root and binary absolutes, or bitwise sign flipping? These things alone don't help, but when your churning large amounts of data, these techniques reroute some math to the main CPU, freeing the FPU, basically giving you a larger throughput.
Also GCC's SIMD code generation litteraly sucks, I've seen upto a 16 fold increase by using VC or IntelCompiler, that means, if you've paid attention, that GCC wasn't using any SIMD instructions at all!
Also those allegded 10k collisions aren't in such close quarters as you sim, so it's not directly comparable.
Upvotes: 4
Reputation: 308823
It's starting to sound almost like a slightly compressible fluid in a gravitational field at this point. Computational fluid dynamics ideas, using an Eulerian viewpoint, might help. If you create a grid with a scale such that only one ball at a time can occupy each cell, you can write conservation of mass, momentum, and energy balances for each cell and track the motion of the balls that way.
Upvotes: 2
Reputation: 19161
Maybe the problem is that there are so many interactions as the balls "pile up"? If I were doing this, I'd try the following:
Upvotes: 1
Reputation: 93565
Keep track of nearby balls -
Just like the insertion sort is optimal due to the minimal change per frame, you can use the same property to keep track ball 'neighbors'
Each ball can only interact with a possible 6 other balls at the most. You can probably run an algorithm every 10 frames or so that figures out which neighbors each ball has, and then use that information for the next ten frames to calculate the actual collisions.
You can also follow the vectors of each ball, draw virtual lines, and see which lines cross in the next 3-5 frames and only calculate collisions that could possibly happen (even though few will happen due to time).
Just as you sorted by the x axis you can 'group' balls in subdivisions inside the main window. When a ball is inside a subdivision by at least one ball diameter, it only need to look at the balls in the same subdivision. If it's closer to a border or two you need to look at one or two other subdivisions, but it should be fewer calculations.
Further, those subdivisions can be dynamically located so the average subdivision has only 100 balls - there's not need to have subdivisions of the same size with different numbers of balls.
Depending on how crowded a subdivision is (balls per square unit) you might try different algorithms - a sparsely filled box only needs vector calculations to predict collisions, whereas a densely packed subdivision might only need some sort of optimized hexagraphic calculation. Sparse boxes may not need collision detection for many frames (since overlap won't be noticed as much as in densely populated boxes).
The energy density of a given box can be determined - a very sparse box with low energy balls needs fewer collision calculations than a sparse box with high energy.
-Adam
Upvotes: 10
Reputation: 24480
I think its time to measure performance to verify where the bottleneck really is. You didn't need to do measurement earlier because there was an obvious algorithm problem. Now there is still room to improve the algorithm, but are you sure thats the biggest problem? Measure how many comparisons you do per ball now. Is it something small? If so then algorithmic changes may not be the best next step.
Compare how long it takes to determine every balls position vs. the time it takes to actually draw them. It could be the later is now the bottleneck, and you should focus your efforts on the rendering code rather than the physics engine.
Upvotes: 0