Reputation: 939
I'm working on a game for DOS which uses video mode 13h.
I've always had issues with screen tearing, but until today I've been ignoring the problem. I assumed it was going to be a challenge to fix since it would involve delaying pixel writes for some precise amount of time. But it was actually a really simple fix.
All you have to do is wait for the vertical retrace bit (bit 3) of the VGA status byte, available at port 0x3da in color mode, to be newly set.
So I just had to modify this old procedure, which writes my frame buffer to the VGA pixel buffer starting at A000:0000:
WRITE_FRAME PROC
;WRITES ALL 64,000 PIXELS (32,000 WORDS) IN THE FRAME BUFFER TO VIDEO MEMORY
push es
push di
push ds
push si
push cx
mov cx, frame
mov ds, cx
xor si, si ;ds:si -> frame buffer (source)
mov cx, vidMemSeg
mov es, cx
xor di, di ;es:di -> video memory (destination)
mov cx, (scrArea)/2 ;writing 32,000 words of pixels
rep movsw ;write the frame
pop cx
pop si
pop ds
pop di
pop es
ret
WRITE_FRAME ENDP
And here's the modified procedure that waits for the vertical retrace bit to be newly set:
WRITE_FRAME PROC
;WRITES ALL 64,000 PIXELS (32,000 WORDS) IN THE FRAME BUFFER TO VIDEO MEMORY
push es
push di
push ds
push si
push ax
push cx
push dx
mov cx, frame
mov ds, cx
xor si, si ;ds:si -> frame buffer (source)
mov cx, vidMemSeg
mov es, cx
xor di, di ;es:di -> video memory (destination)
mov cx, (scrArea)/2 ;writing 32,000 words of pixels
;If vert. retrace bit is set, wait for it to clear
mov dx, 3dah ;dx <- VGA status register
VRET_SET:
in al, dx ;al <- status byte
and al, 8 ;is bit 3 (vertical retrace bit) set
jnz VRET_SET ;If so, wait for it to clear
VRET_CLR: ;When it's cleared, wait for it to be set
in al, dx
and al, 8
jz VRET_CLR ;loop back till vert. retrace bit is newly set
rep movsw ;write the frame
pop dx
pop cx
pop ax
pop si
pop ds
pop di
pop es
ret
WRITE_FRAME ENDP
It's not completely perfect. There's still a little jitter, especially when the background behind the sprite is scrolling up or down, but it doesn't hurt to look at anymore.
My question is, why does this work?
My guess is that when the vertical retrace bit is set, the pixels have already been read into the VGA card's memory, and it is currently in the process of writing it's already loaded pixels. However, when the vertical retrace bit is cleared, it is in the process of loading the pixels from A000:0000 into local memory. It uses DMA for this, right?
So, it's only safe to write to A000:0000 when the VGA card is writing pixels (bit set), and not loading pixels in (bit cleared)
Or am I totally wrong?
Upvotes: 2
Views: 1275
Reputation: 364160
There is no separate buffer that a VGA card reads into. (Remember that when VGA was new, even 32kiB of DRAM was expensive. Also, memory bandwidth was low. Some video cards used to use dual-ported RAM so access from the CPU wouldn't disturb scan-out; it could be read/written on one port while the CRTC / RAMDAC was reading pixel data.)
During a vertical-blanking interval, the video card isn't reading or writing video RAM at all; it exists so the CRT can change the voltage of the electron-beam deflection plates back to the top of the screen without drawing a line up the screen. Then the VGA hardware starts reading video RAM in order again for the next scan-out of the next frame.
(Modern hardware of course doesn't drive a CRT, but reading VRAM in order with a "blanking interval" is still a thing).
Waiting for the bit to be set then cleared helps make it likely that your code starts running at the start of the blanking interval, instead of maybe near the end of the blanking interval.
If your code that modifies video RAM runs quickly enough, it's done before the hardware starts reading again, so you don't get tearing. (Actually, because you're writing the screen in scan-out order, it only needs to be fast enough to stay ahead of the raster scan, so the screen output doesn't pass the memcpy and display some "old" pixels later in the frame.)
On old hardware, rep movsw
wasn't fast enough to copy a whole frame of data during the VBI, especially not when writing to memory-mapped I/O over an ISA bus. Instead you'd typically double-buffer by changing the VGA base to point to an already-drawn frame during the VBI. So you draw in one buffer while the other being scanned out, giving you a whole frame interval to update it, instead of just the VBI.
rep movsw
runs very fast on actual modern CPUs (e.g. if you boot a modern PC in real mode). If VRAM is mapped as WC (aka USWC: uncacheable speculative write combining), then rep movsw
will copy 16 or 32 bytes at a time (Fast Strings mode or even ERMSB (Enhanced Rep Mov/Stos B)), benefiting from write-combining buffers. (Regular stores on WC memory are like NT stores on normal WB (writeback) memory). Intel errata (like IvyBridge BU2) indicates that REP MOVS on WC memory really does work this way: if you cross a page from WC into UC memory, some stores to UC memory can happen with wide fast-strings stores instead of separate 16-bit stores for rep movsw
. That means the CPU must be doing wide stores to WC memory.
If the source data is hot in L1d or L2 cache because you just wrote it, and the destination is USWC video RAM, then blitting it with rep movsw
should easily finish during the VBI. If it's mapped as UC (this used to be a BIOS option when WC was a relatively new feature, on Pentium III / early K8 boards at least), then a modern multi-GHz PC is probably still plenty fast.
(BTW, repne cmpsb
is still slow, but rep movs/stos is fast).
BTW, even with integrated graphics where "video RAM" is still just part of your regular DRAM, it will be UC (uncacheable) or WC (un-cacheable write-combining)). Of course, most of the VGA interface is emulated these days. VGA memory might be the real frame buffer used by your graphics hardware, though (if running on bare metal, not DOSBOX or other emulator).
Anyway, on modern hardware for low rez, you're probably fine to only check for the bit being cleared, as the copy runs so fast compared to the refresh rate that there's near-zero chance of getting any tearing. Or maybe the first pixel or two might come from the old frame.
On DOSBOX simulating a real old PC with a realistic clock speed:
@Ped7G says rep movsw
wasn't fast enough to copy a frame during the VBI, unless you set DOSBOX to simulate a 486 at ~70MHz, or "dynamic / max" speed.
Upvotes: 8