Reputation: 749
I'm having trouble managing "keyboard overflows" on Intel Assembly. The main issue is that after reading the maximum size specified by the read call, the remaining data is throw in the terminal. I'm using Linux on the x64 architecture. This is in fact, homework. My main idea is the following:
%define maxChars 10
%define maxChars_2 100
section .bss
strLida : resb maxChars
strLidaL : resd 1
read:
mov dword [strLidaL], maxChars
mov rax, 0
mov rdi, 1
mov rsi, strLida
mov rdx, [strLidaL]
syscall
mov [strLidaL], rax
size_compare:
cmp [strLidaL], maxChars
jge overflow
overflow:
mov dword [strLidaL_2], maxChars_2
mov rax, 0
mov rdi, 1
mov rsi, strLida_2
mov rdx, [strLidaL_2]
syscall
This is far from a good solution, it jumps to another read function when the max characters so it can swallow the remaining overflowing characters. There is a syscall for that? There's a better solution? Thanks for the input.
Upvotes: 1
Views: 507
Reputation: 44066
Your solution, once generalised, is perfectly fine.
First of all, consider this C program
cook.c
#include <stdio.h>
int main()
{
char buffer[200];
scanf("%s", buffer);
return 0;
}
it's vulnerable and the return
is redundant but bear with me.
This program just reads a string from the input, pretty much like yours.
If you type a short string like hello world scanf
will read hello into buffer
but world won't appear in the terminal (unlike your program).
So how does scanf
do the trick?
A handy way to analyse a program without reverse engineering it (or fetching the source) is strace.
If I run strace ./cook
in my system I can see that cook
executes the sys_read
system call as
read(0, "hello world\n", 1024) = 12
Thus scanf
simply reads, in this case, in chunks of 1024 bytes.
I don't know the logic used by libc to set the length of the read and since I don't think it's relevant here I won't dig into it.
What if we type more than 1024 characters?
If I type 1 2 3 4 ... 1024 (i.e. all the numbers up to 1024 separated by a space) and press the result is
manager@debian64-jboss:~$ ./cook
1 2 3 4 5 [... omitted]
manager@debian64-jboss:~$ 284 285 286 287 288 289 290 [... omitted]
showing that part of the input makes it to the terminal prompt.
If we do the math we get 9*2 + 90*3 + 184 * 4 = 1024 as expected.
Long story short: you are not really experiencing a problem - that's the expected behaviour under Linux.
In your case, it is more annoying because you are reading a low number of bytes.
The long story involves the input processing mode: canonical or non-canonical.
The default one is canonical where the OS buffers lines of text in order to provide input editing facilities.
If your program asks for 5 bytes and the user types hello world and presses enter the OS will buffer the whole "hello world\n" string but sys_read
will read only up to the space, leaving " world\n" for the next reader (the shell).
You can choose to fix or mitigate this.
Reading in bigger sizes mitigates the problem - like the C example. Since you should always check the return value of a function or system call this shouldn't impact heavily on your program layout.
Alternatively, you can follow the advice of comp.lang.c and read all the input.
In assembly, you can do that in a general way with
;edi = file descriptor
emptyfd:
lea rsi, [rsp-80h] ;We use the redzone for the read buffer
mov edx, 80h ;Chunk length
.read_chunk:
xor eax, eax ;sys_read
syscall
;We read all the buffer? (Note: this also check for errors as long as rdx != -1)
cmp rdx, rax
je .read_chunk
ret
Beware of the clobbered registers.
I'm not aware of any system call doing this, I don't expect any though - the standard input has no special meaning for the kernel.
As a side note, a good way to zero a register is xoring it with itself.
Also, moving or performing an operation on the lower 32-bit part of a 64-bit register zeroes the upper 32 bits - so mov rdi, 1
can be written as mov edi, 1
.
NASM will implicitly convert the former into the latter anyway.
Upvotes: 1