jackson blackson
jackson blackson

Reputation: 311

Finding Occurrences in a String in Assembly x86

I have been working on this program where I have to input a string and then display the character distribution in that string.

For example:
if the input is “minecode” the output should be

C – 1
O – 1
D – 1
E – 2
I – 1
M – 1
N – 1

Here is what I tried to do, but I really don't know how to traverse the loop and check for similar character and then increment the count. The assembler is MASM 615 running on a 32 bit machine.

.686
.MODEL flat, stdcall
.STACK
INCLUDE Irvine32.inc
.DATA
  msg0 BYTE "Enter a string of characters: ",0
  msg1 BYTE "Character Distribution: ",0
  MainArray    dword 10000 dup (?)
  UniqueChar   dword 10000 dup (?)
  CharCount    dword 10000 dup (?)
.CODE
MAIN PROC
    lea edx, msg0
    call WriteString
    call dumpregs        ; 1
    call ReadString
    mov MainArray, eax
    call dumpregs        ; 2
    mov MainArray, ebx
    call dumpregs        ; 3
    call CRLF
    lea edx, msg1
    call WriteString
    call CharDist        ; Calls the procedure
    call dumpregs        ; 5
    exit
MAIN ENDP
CharDist PROC
    mov ecx, lengthof MainArray
    mov esi, OFFSET MainArray
    L1:
; what to do here?? 
Loop L1:
CharDist ENDP
END MAIN

Upvotes: 3

Views: 2888

Answers (2)

Davislor
Davislor

Reputation: 15164

One possible approach: make an array of 256 counters, store its base address in ebx, and for each byte in the string, increment the counter at that offset from ebx. Afterward, loop over your array of counters and print the nonzero counts.

You never say whether this is a string terminated by a 0 byte (C-style), one preceded by its length (Pascal-style), or one whose length is passed in as a second parameter, but that will determine when you terminate the loop. If you’re looking for a terminating zero, test the byte you just read, and if you’re counting a specific number of bytes, keep the count of bytes remaining in ecx and test that. (There are special instructions to branch conditionally if ecx is not zero, if you feel like using them.)

If you keep your pointer to the string in esi, you can load the next byte into al with the lodsb instruction. Alternatively, you can mov from [esi] and then inc esi. If you zero out eax before storing each byte in al, this will give you an index in eax that you can use with an array of counters.

Upvotes: 5

Ped7g
Ped7g

Reputation: 16626

Other possible approach:

Should be easier to understand and actually it is very "human" straightforward (like how would you do it on paper with pencil), so I wonder how you didn't come up with this one ... you should not only try to understand how this works, but also try to figure out why you didn't see it, what is confusing/blocking you.

for all letters of alphabet [A-Z] as "letter" {
    counter = 0;
    for all characters in input string as "char" {
        if (letter ignore_case_equals char) ++counter;
    }
    if (0 < counter) {
        display "letter - counter" and new line.
    }
}

This may be actually faster for English alphabet and short string, like 3-5 letters (containing letters only); than the suggested count sort, as alphabet is 26 chars and count table is 256 bytes, so 256/26 = ~9. But count table will reveal count for any character, including non-alphabet ones and also it will stall less on branching.


Also notice, how you did start with emitting code for prompts/input/etc... things you already could do in Assembly, avoiding the unknown.

I did start with almost-English description of algorithm. Because I don't care, what I know to write in Assembly (I already know, that pretty much anything :) ), first I want to be sure I know what I want the code do for me and what kind of data I want to process. Then I will command the CPU to do it, by finalizing the data structures plan and splitting those English notes into simpler and simpler steps, until they start to resemble instructions, then I fill up the instructions between comments.

Don't skip the native language reasoning phase, it may save you lot of work on the code, if you figure out some elegant way to organize data or cut down algorithm steps by reusing certain part of it. Not every problem has short elegant solution, but many have.

Upvotes: 3

Related Questions