JonGrimes20
JonGrimes20

Reputation: 165

How to determine if a value is a character or number in MASM

I'm having trouble with figuring out how to determine if a value is a number or letter in MASM assembly language. This program should go through and array and display the first number found in an array and print it along with the index it was found at. I'm using the Irvine32.inc library which contains IsDigit but for some reason it isn't working and I don't know why.

Here's the code:

TITLE Number Finder

INCLUDE Irvine32.inc

.data
AlphaNumeric SDWORD 'A', 'p', 'Q', 'M', 67d, -3d, 74d, 'G', 'W', 92d
Alphabetical DWORD 'A', 'B', 'C', 'D', 'E'
Numeric      DWORD  0, 1, 2, 3, 4, 5, 6
index        DWORD  ?
valueFound   BYTE "number found: ", 0
atIndex      BYTE "at index: ", 0
noValueFound BYTE "no numeric found", 0
spacing      BYTE ", ", 0

;DOESNT WORK CORRECTLY
;SKIPS the value 67

.code
main PROC
mov esi, OFFSET AlphaNumeric    ;point to start of array
mov ecx, LENGTHOF AlphaNumeric  ;set loop counter
mov index, 0

mov eax, 0                      ; clear eax

L1: mov al, [esi]
    call IsDigit                    ; ZF = 1 -> valid digit , ZF = 0 -> not a valid digit

;jmp if digit
jz NUMBER_FOUND 

;jmp if char
jnz CHARACTER

;this probably never gets reached
inc index
add esi, TYPE AlphaNumeric
loop L1

;if loop finishes without finding a number
jmp NUMBER_NOT_FOUND

;next iteration of loop if val is a char
CHARACTER:
add esi, TYPE AlphaNumeric
add index, 1
loop L1

NUMBER_FOUND:
mov edx, OFFSET valueFound
call WriteString                ; prints "number found"
mov eax, [esi]
call WriteInt                   ; prints the number found
mov edx, OFFSET spacing
call WriteString
mov edx, OFFSET atIndex
call WriteString                ; prints "at index: "
mov eax, index
call WriteDec                   ; prints the index value

;jmp to NEXT to skip NUMBER_NOT_FOUND block
jmp NEXT

NUMBER_NOT_FOUND:
mov edx, OFFSET noValueFound
call WriteString

NEXT:

exit
main ENDP
END main

When I debug it, when it gets the the loop iteration where it processes the value 67d it load 43 into al which is its hex representation but since 43h lines up with the ASCII value 'C' is assuming that call IsDigit processes this as a letter and not a number. It also skips all numbers and will print "Number found: +65, at index: 10" which shouldn't even happen. Is there an operation I can use to convert the hex value to the decimal value for the IsDigit call to work correctly? So if someone could please explain a way to evaluate if a value in an array is either a number or letter, capital and lowercase, that would be very much appreciated.

Upvotes: 0

Views: 1583

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364428

This is an impossible task. The most you can do is check for numbers that aren't the ASCII code for an alphabetic character (https://asciitable.com/), which is what your code does. Index 5 is the first byte where that's the case.

67 (decimal) is the same byte value as 'C'. Once it's assembled into binary bytes in your .data section, they're the same single byte. Thus there's no way you can tell how it was written in the source; db 67, 'C' is the same pair of bytes as db 'C', 67. It's a number that's in the range of upper-case ASCII codes. Another equivalent way to write the same value in the source is 43h.

Bytes don't have types associated with them, just the 8-bit bit-pattern which represents a value. Different interpretations of the same bits could be different values, e.g. -3 (signed) and 253 (unsigned) are both represented by the bit-pattern 0b11111101 which is 0xfd. All of those are valid ways of writing the value that gets loaded into AL by your program. Numbers in a computer are binary; hex and decimal are just convenient formats for humans, so debuggers convert binary values into strings of ASCII digits for display.
As a character value, it also represents a font glyph in some 8-bit character sets.

If your program doesn't keep track of types separately, that info is not recoverable.

Normally you write programs to know that a whole array holds 8-bit numbers, or holds ASCII codes, just like in C you have functions that take int8_t* or char*, even though those are the same actual type, they have different semantic meaning for human programmers. Or another example would be int* vs. char*; you certainly could look at the bytes of an int array as character data (with many of the characters being '\0' or '\xff' for small positive / negative integer values), but you don't try to figure it out by looking at the byte values. Higher-level languages like Python and Perl store a type along with each object, like a struct { enum type; union { stuff }; }, with many types like a string including a pointer.


Re: implementing an IsAlpha function: See What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? - it only takes a few instructions.

;; input in DL, unmodified
IsAlpha:
    mov     eax, edx
    or      al, 0x20  ; force to lower case if it wasn't already
    sub     al, 'a'
    cmp     al, 25    ; 'z'-'a' = index of the last letter in the alphabet
      ; setbe al      ; for a boolean 0/1 return value in AL
    ret
;; return in FLAGS: ja non_alpha    or   jbe  alphabetic

Upvotes: 1

Related Questions