Reputation: 1174
So I've been tacking on some things to my game and after adding a particular thing the game got pretty laggy. I did some tests using pygame.time.get_ticks() to see where the time was being spent in my loop, and found about 90% of time was spent in two locations. 1. Drawing all my sprites to the screen. 2. Drawing my ability manager, which is just drawing/blitting some images.
I got confused when removing my convert() and convert_alpha() significantly improved performance in my ability manager, and then removing the converts in drawing my sprites did not seem to affect performance.
Anyone have any idea why convert might slow things down, the docs say its the best way to go. Also, why might it help in one area and not another?
Edit: Some numbers to show my tests. Removing converts for #2, the ability manager drawing, decreased the average time to draw them from roughly 80 milliseconds to roughly 45 milliseconds. Removing or adding converts for #1, drawing sprites to screen, hardly affects the time to do things. The affect ranges from + or - 5. This small change may not be the result of removing converts, so my question should mainly focus on "Why does removing convert help so much in the ability manager drawing?", and only a little on why it might help so much in one area and not another.
Upvotes: 4
Views: 179
Reputation: 33631
Caveat: I don't use pygame
but I've written scalers and converters, professionally, for high-def video, so I'm drawing on that experience.
I did look up the documentation here: http://www.pygame.org/docs/ref/surface.html#pygame.Surface.convert
From that:
If no arguments are passed the new Surface will have the same pixel format as the display Surface. This is always the fastest format for blitting.
In other words, if the format matches it goes fast. But, otherwise, it must do a conversion [which will run slower].
It is a good idea to convert all Surfaces before they are blitted many times.
This may be what you want to do (i.e. keep a cached copy of the post converted surface that matches the final output format)
For your sprites, they should be relatively small, so not much difference. For larger areas, the conversion could be [and seems to be] significant.
Instead of a simple blit, which can be done with [the equivalent of] a series of fast [C] memcpy
operations, the conversion must be done pixel-by-pixel. This may involve a convolution kernel using surrounding pixels [For a good scaler, I've seen a 2D 6 tap FIR filter used].
Because the sprites are smaller, the converter may choose a simpler conversion algorithm because the distortion would be less noticeable. For the larger area, the converter may choose a more sophisticated algorithm because the distortion would accumulate across a larger area.
So, again, precaching would be the way to go.
If you can't do that because the source area changes on each frame, you might introduce a one frame lag and do the conversions in multiple threads/cores, subdividing the entire area into subareas across the threads.
UPDATE:
So, you note that at first, there would be a decrease in speed since pixel format must be changed.
Precalculation at the game start should be a non-issue as your numbers are 80 milliseconds. The user won't even notice that small a delay in starting the game.
Professional games mask this with a "splash" page with their logo that may do a [trivial] animation (e.g. just morph the color, etc.)
But after the conversion at the start of the game, shouldn't the speed be better for the rest?
Yes, it should be faster, based on what you've already described: Subsequent frames should be 45 ms instead of 80. That now gives you a frame rate of 22 which might be enough. If you still need to go faster (i.e. to get to 30 fps), doing the subarea technique I already mentioned may help. Also, only blitting what has changed from frame N to N+1 may also help.
I'm still confused on why the speed throughout the game is slower if I converted.
Below is some [crude] code for blit and a convert (i.e. just to illustrate--not real code).
What you're doing now is like blit_convert
below for each frame on your data, which we'll call ability_manager_surface
.
Notice that it's slower than a simple blit (e.g. blit_fast
or blit_slow
below). The fast blits just copy each source pixel to the destination pixel. The sample converter has to take an average of the current source pixel and its nearest neighbors, so it has to fetch five source pixel values for each destination pixel. Hence, it's slower. A real algorithm for scaling might be even slower.
If you do blit_convert
during game startup on ability_manager_surface
and save the output to an "already converted" variable (e.g. precalc_manager_surface
), you can then use blit_fast
on each frame using precalc_manager_surface
. That is, no need to recalculate "static" data.
# dstv -- destination pixel array
# dsthgt -- destination height
# dstwid -- destination width
#
# dstybase -- destination Y position for upper left corner of inset
# dstxbase -- destination X position for upper left corner of inset
#
# srcv -- source pixel array
# srchgt -- source height
# srcwid -- source width
# ------------------------------------------------------------------------------
# blit_fast -- fast blit
# this uses a 1 dimensional array to be fast
def blit_fast(dstv,dsthgt,dstwid,dstybase,dstxbase,srcv,srchgt,srcwid):
# NOTE: I may have messed up the equations here
for yoff in range(dstybase,dstybase + srchgt):
dstypos = (yoff * dstwid) + dstxbase
srcypos = (yoff * srcwid);
for xoff in range(0,srcwid):
dstv[dstypos + xoff] = srcv[srcypos + xoff]
# ------------------------------------------------------------------------------
# blit_slow -- slower blit
# this uses a 2 dimensional array to be more clear
def blit_slow(dstv,dsthgt,dstwid,dstybase,dstxbase,srcv,srchgt,srcwid):
for yoff in range(0,srchgt):
for xoff in range(0,srcwid):
dstv[dstybase + yoff][dstxbase + xoff] = srcv[yoff][xoff]
# ------------------------------------------------------------------------------
# blit_convert -- blit with conversion
def blit_convert(dstv,dsthgt,dstwid,dstybase,dstxbase,srcv,srchgt,srcwid):
for yoff in range(0,srchgt):
for xoff in range(0,srcwid):
dstv[dstybase + yoff][dstxbase + xoff] = convert(srcv,yoff,xoff)
# convert -- conversion function
# NOTE: this is more like a blur or soften filter
# the main point is this takes _more_ time than a simple blit
def convert(srcv,ypos,xpos):
# we ignore the special case for the borders
cur = srcv[ypos][xpos]
top = srcv[ypos - 1][xpos]
bot = srcv[ypos + 1][xpos]
left = srcv[ypos][xpos - 1]
right = srcv[ypos][xpos + 1]
# do a [sample] convolution kernel
# this equation probably isn't accurate -- just to illustrate something that
# is computationally expensive on a per pixel basis
out = (cur * 0.6) + (top * 0.1) + (bot * 0.1) + (left * 0.1) + (right * 0.1)
return out
Note: The above example uses a "toy" conversion function. To do high res/high quality image rescaling (e.g. 1024x768 --> 1920x1080), you might want to use/select "polyphase resampling" and the computation for that is prodigious. For example, just for grins, see [the mind boggling]: https://cnx.org/contents/xOVdQmDl@10/Polyphase-Resampling-with-a-Ra
UPDATE #2:
found the idea of only updating the stuff that moved helpful
That's standard advice for realtime animation and graphics. Only recalc what you need to. You just need to identify which is which.
However, if I read correctly, you say that my game slows down after converting because I do it each frame.
Based on your original description, that would/should be the the case.
This isn't the case, as I convert at the very start, so it should be the fast blit you talk about, but it is faster if I never convert at all
Without your actual code, it's difficult [for me] to speculate. But ...
When you create a surface (e.g. to hold an image file like a .png
), the default format is to use one that closely matches the screen format. Thus, it can be blitted without conversion.
So, if you preconvert an offscreen surface, why is it slower [to blit] if the post-converted format matches the screen format. If it's slower, there would be a mismatch somewhere. And, if you create the surface with the default, why does it need conversion?
The standard model is to do operations directly on the screen as much as possible. The screen is "double buffered" and the actual rendering is done with pygame.display.flip
at the bottom of your main display loop.
So, I'm not sure where surface conversion comes into it within your program.
Here's a link to some sample programs [including some with sprites]: http://www.balloonbuilding.com/index.php?chapter=example_code
This was but one link from a web search of "all words" for "pygame sample program". So, the above link [plus others] may help you if you're able to compare what you're doing against them.
Upvotes: 1