Reputation: 3222
I read a tutorial on nasm and there is a code example which displays the entire ascii character set. I understand pretty much everything except why are we pushing ecx and popping ecx as I dont see how it relates to the rest of the code. Ecx has the value of 256 since we want all chars but no idea where and hows its used. Wht exactly is happening when we push and pop ecx? Why are we moving the address of achar to dx? I dont see us using dx for anything. I understand that we need to increment the adress of achar but im confused how the increment relates to ecx and dx. I would appreciate some insight.
section .text
global _start ;must be declared for using gcc
_start: ;tell linker entry point
call display
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
display:
mov ecx, 256
next:
push ecx
mov eax, 4
mov ebx, 1
mov ecx, achar
mov edx, 1
int 80h
pop ecx
mov dx, [achar]
cmp byte [achar], 0dh
inc byte [achar]
loop next
ret
section .data
achar db '0'
Upvotes: 3
Views: 2264
Reputation: 16606
I understand pretty much everything
Well, then you are sort of quite ahead of me... (although from your further comments you become aware of some other non-sense things in that code :) ).
why are we pushing ecx and popping ecx as I dont see how it relates to the rest of the code. Ecx has the value of 256 since we want all chars but no idea where and hows its used.
It is used by LOOP
instruction (which is not a good idea: Why is the loop instruction slow?), it will decrement ecx
, and jump when value is above zero, i.e. it's a count-down loop mechanism.
As the int 0x80
service call needs ecx
for memory address value, the counter is saved/restored by push
/pop
around that. A more performant way would be to put counter value into some spare register like for example esi
, and do dec esi
jnz next
. Even more performant way would be to re-use the character value itself, if the output would start with zero value, and not zero digit, then the zero flag after inc byte [achar]
can be used to detect looping condition.
achar db '0'
It's not clear to me, why "display all ASCII characters" starts at digit zero (value 48
), seems weird to me, I would start at zero. But that has another caveat, linux console I/O encoding is set by environment, and on any common linux installation it is UTF8 nowadays, so the valid printable single-byte characters are only of values 32-126 (which are identical to ordinary 7 bit ASCII encoding, making this part of example work well), and values 0-31 and 127 are non-printable control characters, also identical to common 7b ASCII encoding. Values 128-255 indicate in UTF8-encoding multi-byte character (example: ř
is two bytes 0xC5 0x99
), and as single bytes they are invalid byte sequence, because the remaining part of UTF8 "code point" bytes is missing.
In the age of DOS you could have wrote code writing directly into VGA text-mode video memory full 8 bit values going from zero to 255, and each has distinct graphical representation, you could specify in VGA custom font or known code-page for particular characters, this is also sometimes referred to as "extended ASCII", but the common DOS installation had different ones from the link in your comments, having many more box-drawing characters. This included \r
and \n
control characters, which are for VGA just another font glyph, not line-feed and new-line control chars (that meaning is created by BIOS/DOS service call, which instead of outputting \n
character will move the internal cursor to next line and discard the char from output).
It's impossible to re-create this with linux console I/O (unless the UTF8 font contains all the weird DOS glyphs, and you would output their correct UTF8 encoding instead of single byte values).
Conclusion is, that the example starts with value '0'
(48
), and up till value 126
it outputs correct printable ASCII characters, after 126
it outputs "something", and as those bytes will sometimes form invalid UTF8 encodings, I would technically call it "bogus" output with undefined behaviour, you can get probably different results for different linux versions and console settings.
Also NASM-style notice: put colon after labels, i.e. achar: db '0'
, that will save you when you use instruction mnemonics as label by accident, like loop:
or dec: db 'd'
.
mov dx, [achar]
The dx
is not used any further, so this is useless instruction.
cmp byte [achar], 0dh
Flags from this compare are not used any further either, so this is also useless.
So the adjusted example can look like this:
section .text
global _start ;must be declared for using gcc
_start: ;tell linker entry point
call display
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
; displays all valid printable ASCII characters (32-126), and new-line after.
display:
mov byte [achar], ' ' ; first valid printable ASCII
next:
mov eax, 4
mov ebx, 1
mov ecx, achar
mov edx, 1
int 0x80
inc byte [achar]
cmp byte [achar], 126
jbe next ; repeat until all chars are printed
; that will output all 32..126 printable ASCII characters
; display one more character, new line (reuse of registers)
mov byte [achar], `\n` ; NASM uses backticks for C-like meta chars
mov eax, 4 ; ebx, ecx and edx are already set from loop above
int 0x80
ret
section .bss
achar: resb 1 ; reserve one byte for character output
But it would make more sense to prepare whole output in memory first, and output it in one go, like this one:
section .text
global _start ;makes symbol "_start" global (visible for linker)
_start: ;linker's default entry point
call display
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
; displays all valid printable ASCII characters (32-126), and new-line after.
display:
; prepare in memory string with all ASCII chars and new-line
mov al,' ' ; first valid printable ASCII
mov edi, allAsciiChars
mov ecx, edi ; this address will be used also for "write" int 0x80
nextChar:
mov [edi], al
inc edi
inc al
cmp al, 126
jbe nextChar
; add one more new line at end
mov byte [edi], `\n`
; display the prepared "string" in one "write" call
mov eax, 4 ; sys_write, ecx is already set
mov ebx, 1 ; file descriptor STDOUT
lea edx, [edi+1]; edx = edi+1 (memory address beyond last char)
sub edx, ecx ; edx = length of generated string
int 0x80
ret
section .bss
allAsciiChars: resb 126-' '+1+1 ; reserve space for ASCII characters and \n
All examples were tried with nasm 2.11.08 on 64b linux ("KDE neon" distro based on Ubuntu 16.04), and built by commands:
nasm -f elf32 -F dwarf -g test.asm -l test.lst -w+all
ld -m elf_i386 -o test test.o
with output:
$ ./test
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
Upvotes: 3