Reputation: 1174

Improving Display Performance

So I've been tacking on some things to my game and after adding a particular thing the game got pretty laggy. I did some tests using pygame.time.get_ticks() to see where the time was being spent in my loop, and found about 90% of time was spent in two locations. 1. Drawing all my sprites to the screen. 2. Drawing my ability manager, which is just drawing/blitting some images.

I got confused when removing my convert() and convert_alpha() significantly improved performance in my ability manager, and then removing the converts in drawing my sprites did not seem to affect performance.

Anyone have any idea why convert might slow things down, the docs say its the best way to go. Also, why might it help in one area and not another?

Edit: Some numbers to show my tests. Removing converts for #2, the ability manager drawing, decreased the average time to draw them from roughly 80 milliseconds to roughly 45 milliseconds. Removing or adding converts for #1, drawing sprites to screen, hardly affects the time to do things. The affect ranges from + or - 5. This small change may not be the result of removing converts, so my question should mainly focus on "Why does removing convert help so much in the ability manager drawing?", and only a little on why it might help so much in one area and not another.

Upvotes: 4

Answers (1)

Craig Estey

Reputation: 33631

Caveat: I don't use pygame but I've written scalers and converters, professionally, for high-def video, so I'm drawing on that experience.

I did look up the documentation here: http://www.pygame.org/docs/ref/surface.html#pygame.Surface.convert

From that:

If no arguments are passed the new Surface will have the same pixel format as the display Surface. This is always the fastest format for blitting.

In other words, if the format matches it goes fast. But, otherwise, it must do a conversion [which will run slower].

It is a good idea to convert all Surfaces before they are blitted many times.

This may be what you want to do (i.e. keep a cached copy of the post converted surface that matches the final output format)

For your sprites, they should be relatively small, so not much difference. For larger areas, the conversion could be [and seems to be] significant.

Instead of a simple blit, which can be done with [the equivalent of] a series of fast [C] memcpy operations, the conversion must be done pixel-by-pixel. This may involve a convolution kernel using surrounding pixels [For a good scaler, I've seen a 2D 6 tap FIR filter used].

Because the sprites are smaller, the converter may choose a simpler conversion algorithm because the distortion would be less noticeable. For the larger area, the converter may choose a more sophisticated algorithm because the distortion would accumulate across a larger area.

So, again, precaching would be the way to go.

If you can't do that because the source area changes on each frame, you might introduce a one frame lag and do the conversions in multiple threads/cores, subdividing the entire area into subareas across the threads.

UPDATE:

So, you note that at first, there would be a decrease in speed since pixel format must be changed.

Precalculation at the game start should be a non-issue as your numbers are 80 milliseconds. The user won't even notice that small a delay in starting the game.

Professional games mask this with a "splash" page with their logo that may do a [trivial] animation (e.g. just morph the color, etc.)

But after the conversion at the start of the game, shouldn't the speed be better for the rest?

Yes, it should be faster, based on what you've already described: Subsequent frames should be 45 ms instead of 80. That now gives you a frame rate of 22 which might be enough. If you still need to go faster (i.e. to get to 30 fps), doing the subarea technique I already mentioned may help. Also, only blitting what has changed from frame N to N+1 may also help.

I'm still confused on why the speed throughout the game is slower if I converted.

Below is some [crude] code for blit and a convert (i.e. just to illustrate--not real code).

What you're doing now is like blit_convert below for each frame on your data, which we'll call ability_manager_surface.

Notice that it's slower than a simple blit (e.g. blit_fast or blit_slow below). The fast blits just copy each source pixel to the destination pixel. The sample converter has to take an average of the current source pixel and its nearest neighbors, so it has to fetch five source pixel values for each destination pixel. Hence, it's slower. A real algorithm for scaling might be even slower.

If you do blit_convert during game startup on ability_manager_surface and save the output to an "already converted" variable (e.g. precalc_manager_surface), you can then use blit_fast on each frame using precalc_manager_surface. That is, no need to recalculate "static" data.

# dstv -- destination pixel array
# dsthgt -- destination height
# dstwid -- destination width
#
# dstybase -- destination Y position for upper left corner of inset
# dstxbase -- destination X position for upper left corner of inset
#
# srcv -- source pixel array
# srchgt -- source height
# srcwid -- source width

# ------------------------------------------------------------------------------
# blit_fast -- fast blit
# this uses a 1 dimensional array to be fast
def blit_fast(dstv,dsthgt,dstwid,dstybase,dstxbase,srcv,srchgt,srcwid):

    # NOTE: I may have messed up the equations here
    for yoff in range(dstybase,dstybase + srchgt):
        dstypos = (yoff * dstwid) + dstxbase
        srcypos = (yoff * srcwid);

        for xoff in range(0,srcwid):
            dstv[dstypos + xoff] = srcv[srcypos + xoff]

# ------------------------------------------------------------------------------
# blit_slow -- slower blit
# this uses a 2 dimensional array to be more clear
def blit_slow(dstv,dsthgt,dstwid,dstybase,dstxbase,srcv,srchgt,srcwid):

    for yoff in range(0,srchgt):
        for xoff in range(0,srcwid):
            dstv[dstybase + yoff][dstxbase + xoff] = srcv[yoff][xoff]

# ------------------------------------------------------------------------------
# blit_convert -- blit with conversion
def blit_convert(dstv,dsthgt,dstwid,dstybase,dstxbase,srcv,srchgt,srcwid):

    for yoff in range(0,srchgt):
        for xoff in range(0,srcwid):
            dstv[dstybase + yoff][dstxbase + xoff] = convert(srcv,yoff,xoff)

# convert -- conversion function
# NOTE: this is more like a blur or soften filter
# the main point is this takes _more_ time than a simple blit
def convert(srcv,ypos,xpos):

    # we ignore the special case for the borders

    cur = srcv[ypos][xpos]

    top = srcv[ypos - 1][xpos]
    bot = srcv[ypos + 1][xpos]
    left = srcv[ypos][xpos - 1]
    right = srcv[ypos][xpos + 1]

    # do a [sample] convolution kernel
    # this equation probably isn't accurate -- just to illustrate something that
    # is computationally expensive on a per pixel basis
    out = (cur * 0.6) + (top * 0.1) + (bot * 0.1) + (left * 0.1) + (right * 0.1)

    return out

Note: The above example uses a "toy" conversion function. To do high res/high quality image rescaling (e.g. 1024x768 --> 1920x1080), you might want to use/select "polyphase resampling" and the computation for that is prodigious. For example, just for grins, see [the mind boggling]: https://cnx.org/contents/xOVdQmDl@10/Polyphase-Resampling-with-a-Ra

UPDATE #2:

found the idea of only updating the stuff that moved helpful

That's standard advice for realtime animation and graphics. Only recalc what you need to. You just need to identify which is which.

However, if I read correctly, you say that my game slows down after converting because I do it each frame.

Based on your original description, that would/should be the the case.

This isn't the case, as I convert at the very start, so it should be the fast blit you talk about, but it is faster if I never convert at all

Without your actual code, it's difficult [for me] to speculate. But ...

When you create a surface (e.g. to hold an image file like a .png), the default format is to use one that closely matches the screen format. Thus, it can be blitted without conversion.

So, if you preconvert an offscreen surface, why is it slower [to blit] if the post-converted format matches the screen format. If it's slower, there would be a mismatch somewhere. And, if you create the surface with the default, why does it need conversion?

The standard model is to do operations directly on the screen as much as possible. The screen is "double buffered" and the actual rendering is done with pygame.display.flip at the bottom of your main display loop.

So, I'm not sure where surface conversion comes into it within your program.

Here's a link to some sample programs [including some with sprites]: http://www.balloonbuilding.com/index.php?chapter=example_code

This was but one link from a web search of "all words" for "pygame sample program". So, the above link [plus others] may help you if you're able to compare what you're doing against them.

Upvotes: 1

Improving Display Performance

Answers (1)

Related Questions