spoonlagoon
spoonlagoon

Reputation: 3

Finding repeated occurrences of bytes in C

I am trying to modify a function so that it can traverse through a block of memory and find repeated sections of bytes. The current function I have written can find a single one, but I think I am having trouble with my pointers and logic.

Here is my original function to find a single byte. It returns the number of times it is found and the offsets are stored in pOffsets:

//pOffsets points to an array large enough to hold the results
//blockAddress points to the first byte of the memory region
//blockLength is the number of bytes in the memory region
//Byte is the value to be searched for
//maxBytes is the maximum number of bytes that are to be copied

uint32_t findOccurrencesOfByte(uint32_t* const pOffsets,
                           const uint8_t* const blockAddress, uint32_t blockLength,
                           uint8_t Byte, uint32_t maxBytes) {
    uint32_t count, pos;
    count = 0;
    for (pos = 0; count < maxBytes && pos < blockLength; pos++) {
        if (*(blockAddress + pos) == Byte) {
            *(pOffsets + count) = pos;
            count++;
        }
    }
    return count;
}

This is the piece of code I am trying to change to find a section of bytes:

//pOffsets points to an array of dimension at least 256
//blockAddress points to the first byte of the memory region
//blockLength is number of bytes in the memory region
//pPattern points to a copy of the pattern that is to be found
//patternLength is the number of bytes in the target pattern

uint32_t findOccurrencesOfPattern(uint32_t* const pOffsets,
                              const uint8_t* const blockAddress, uint32_t blockLength,
                              const uint8_t* const pPattern, uint8_t patternLength) {
    uint32_t count, pos, i;
    count = 0;
    for (pos = 0; pos < blockLength; pos++) {
        for (i = 0; patternLength != 0; patternLength--) {
            if (*(blockAddress + pos + i) == *pPattern) {
                *(pOffsets + count) = pos;
                count++;
            }
        }
    }
    return count;
}

Can anyone help me figure out what I'm doing wrong? Thanks

Upvotes: 0

Views: 398

Answers (2)

Chris Korlinchak
Chris Korlinchak

Reputation: 51

If you are storing an array of pointers to the start of match locations. The input type should be uint8_t**. uint8_t** is a pointer to uint8_t* which is the address of the match location.

You do not need to have const twice in the same input parameter.

I also agree with the comment above that index with [ ] makes for more readable code.

The following should work for you.

#define MAX_MATCHES 2

const uint8_t  data[] = { 1,2,3,4,5,6,1,2,3,4,5,6,8,7,6,4,5,6 };
const uint8_t  pattern[] = { 4,5,6 };

const uint8_t * startAddress[MAX_MATCHES];

uint32_t findOccurrencesOfPattern(const uint8_t** matchAddress,
    const uint8_t* blockAddress, uint32_t blockLength,
    const uint8_t* pPattern, uint8_t patternLength)
{
    int count = 0;

    for (int i = 0; i <= blockLength - patternLength; i++)
    {
        bool match = true;
        for (int j = 0; j < patternLength; j++)
        {
            if (blockAddress[i + j] != pPattern[j])
            {
                match = false;
                break;
            }
        }

        if (match)
        {
            match = false;
            matchAddress[count] = &blockAddress[i];
            count++;

            //Don't look for overlapping patterns
            i += patternLength;  

            //Don't overrun matchAdress
            if (count >= MAX_MATCHES) 
            {
                return count;
            }
        }
    }
    return count;
}

Test Function

int main()
{
    uint32_t count = findOccurrencesOfPattern(startAddress, data, sizeof(data), pattern, sizeof(pattern));
}

Upvotes: 1

Vlad Rusu
Vlad Rusu

Reputation: 1479

Basically, what you are trying to do is to match the pattern starting from every point in your memory block. Decreasing patternLength makes no sense, since you will need that length multiple times.

uint32_t findOccurrencesOfPattern(uint32_t* const pOffsets,
                              const uint8_t* const blockAddress, uint32_t blockLength,
                              const uint8_t* const pPattern, uint8_t patternLength) {
    uint32_t count, pos, i;
    count = 0;
    for (pos = 0; pos < blockLength - patternLength; pos++) {
        uint8_t ok = 1;
        for (i = 0; i < patternLength; i++) {
            if (*(blockAddress + pos + i) != *(pPattern + i)) {
                ok = 0;
                break;
            }
        }
        if (ok)
            count++;
    }
    return count;
}

Also, you don't want to increase the count inside the inner for, because you would be counting every matched byte (even those in partial matches).

You should stop the iteration at blockLength - patternLength, because after that you definitely cannot find any other match.

And the approach should be: assume that you find a match (ok = 1 initially). Then check if it is so. When you find a contradiction, say it is not ok and break the loop (you already determined that it's not a match).

Upvotes: 1

Related Questions