RSK
RSK

Reputation: 29

C++/ActiveX replacing realloc with malloc, memcpy, free. Functional and Performance tests

I've been assigned to a project that is a complex legacy system written in C++ and ActiveX ~ 10 years old.

The setup is Microsoft Visual Studio 2008.

Whilst there are no issues with the system right now, as part of the security review of the legacy system, an automated security code scanning tool has marked instances of realloc as Bad Practice issue, due to security vulnerability.

This is because realloc function might leave a copy of sensitive information stranded in memory where it cannot be overwritten. The tool recommends replacing realloc with malloc, memcpy and free.

Now realloc function being versatile, will allocate memory when the source buffer is null. It also frees memory when the size of the buffer is 0. I was able to verify both these scenarios. Source: MDSN Library 2001

realloc returns a void pointer to the reallocated (and possibly moved) memory block. The return value is NULL if the size is zero and the buffer argument is not NULL, or if there is not enough available memory to expand the block to the given size. In the first case, the original block is freed. In the second, the original block is unchanged. The return value points to a storage space that is guaranteed to be suitably aligned for storage of any type of object. To get a pointer to a type other than void, use a type cast on the return value.

So, my replacement function that uses malloc, memcpy and free has to cater for these cases.

I have reproduced below the original code snippet (an array implementation) that uses realloc to dynamically resize and shrink its internal buffer.

First the class definition:

template <class ITEM>
class CArray 
{
// Data members:
protected:
ITEM                *pList;
int                 iAllocUnit;
int                 iAlloc;
int                 iCount;

public:
CArray() : iAllocUnit(30), iAlloc(0), iCount(0), pList(NULL)
{
}

virtual ~CArray()
{
    Clear(); //Invokes SetCount(0) which destructs objects and then calls ReAlloc
} 

The existing ReAlloc method:

void ReAllocOld()
{
    int iOldAlloc = iAlloc;

    // work out new size
    if (iCount == 0)
        iAlloc = 0;
    else
        iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;

    // reallocate
    if (iOldAlloc != iAlloc)
    {
        pList = (ITEM *)realloc(pList, sizeof(ITEM) * iAlloc);
    }
}

The following is my implementation that replaces these with malloc,memcpy and free:

void ReAllocNew()
{
    int iOldAlloc = iAlloc;

    // work out new size
    if (iCount == 0)
        iAlloc = 0;
    else
        iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;

    // reallocate
    if (iOldAlloc != iAlloc)
    {

        size_t iAllocSize = sizeof(ITEM) * iAlloc;

        if(iAllocSize == 0)
        {
            free(pList); /* Free original buffer and return */
        }
        else
        {
            ITEM *tempList = (ITEM *) malloc(iAllocSize); /* Allocate temporary buffer */

            if (tempList == NULL) /* Memory allocation failed, throw error */
            {
                free(pList);
                ATLTRACE(_T("(CArray: Memory could not allocated. malloc failed.) "));
                throw CAtlException(E_OUTOFMEMORY);
            }

            if(pList == NULL) /* This is the first request to allocate memory to pList */
            {
                pList = tempList; /* assign newly allocated buffer to pList and return */
            } 
            else 
            {
                size_t iOldAllocSize = sizeof(ITEM) * iOldAlloc;        /* Allocation size before this request */
                size_t iMemCpySize = min(iOldAllocSize, iAllocSize);    /* Allocation size for current request */

                if(iMemCpySize > 0)
                {
                    /* MemCpy only upto the smaller of the sizes, since this could be request to shrink or grow */
                    /* If this is a request to grow, copying iAllocSize will result in an access violation */
                    /* If this is a request to shrink, copying iOldAllocSize will result in an access violation */
                    memcpy(tempList, pList, iMemCpySize); /* MemCpy returns tempList as return value, thus can be omitted */
                    free(pList); /* Free old buffer */
                    pList = tempList; /* Assign newly allocated buffer and return */
                }
            }

        }
    }
}

Notes:

  1. Objects are constructed and destructed correctly in both the old and new code.

  2. No memory leaks detected (as reported by Visual Studio built in CRT Debug heap functions: http://msdn.microsoft.com/en-us/library/e5ewb1h3(v=vs.90).aspx)

  3. I wrote a small test harness (console app) that does the following:

    a. Add 500000 instances of class containing 2 integers and an STL string.

    Integers added are running counter and its string representations like so:

    for(int i = 0; i < cItemsToAdd; i++)
    {
        ostringstream str;
        str << "x=" << 1+i << "\ty=" << cItemsToAdd-i << endl;
        TestArray value(1+i, cItemsToAdd-i, str.str());
        array.Append(&value);
    }
    

    b. Open a big log file containing 86526 lines of varying lengths, adding to an instance of this array: CArray of CStrings and CArray of strings.

I ran the test harness with the existing method (baseline) and my modified method. I ran it in both debug and release builds.

The following are the results:

Test-1: Debug build -> Adding class with int,int,string, 100000 instances:

Original implementation: 5 seconds, Modified implementation: 12 seconds

Test-2: Debug build -> Adding class with int,int,string, 500000 instances:

Original implementation: 71 seconds, Modified implementation: 332 seconds

Test-3: Release build -> Adding class with int,int,string, 100000 instances:

Original implementation: 2 seconds, Modified implementation: 7 seconds

Test-4: Release build -> Adding class with int,int,string, 500000 instances:

Original implementation: 54 seconds, Modified implementation: 183 seconds

Reading big log file into CArray of CString objects:

Test-5: Debug build -> Read big log file with 86527 lines CArray of CString

Original implementation: 5 seconds, Modified implementation: 5 seconds

Test-6: Release build -> Read big log file with 86527 lines CArray of CString

Original implementation: 5 seconds, Modified implementation: 5 seconds

Reading big log file into CArray of string objects:

Test-7: Debug build -> Read big log file with 86527 lines CArray of string

Original implementation: 12 seconds, Modified implementation: 16 seconds

Test-8: Release build -> Read big log file with 86527 lines CArray of string

Original implementation: 9 seconds, Modified implementation: 13 seconds

Questions:

  1. As you can see from the above tests, realloc is consistently faster compared to memalloc, memcpy and free. In some instances (Test-2 for eg) its faster by a whopping 367%. Similarly for Test-4 it is 234%. So what can I do to get these numbers down that is comparable to realloc implementation?

  2. Can my version be made more efficient?

Assumptions:

  1. Please note that I cannot use C++ new and delete. I have to use only malloc and free. I also cannot change any of the other methods (as it is existing functionality) and impacts are huge. So my hands are tied to get the best implementation of realloc that I possibly can.

  2. I have verified that my modified implementation is functionally correct.

PS: This is my first SO post. I have tried to be as detailed as possible. Suggestions on posting is also appreciated.

Upvotes: 3

Views: 1112

Answers (3)

AndersK
AndersK

Reputation: 36092

If you look at an implementation of realloc e.g.

http://www.scs.stanford.edu/histar/src/pkg/uclibc/libc/stdlib/malloc/realloc.c

you see that the difference between your implementation and an existing one is that it expands the memory heap block instead of creating a whole new block by using low-level calls. This probably accounts for some of the speed difference.

I think you also need to consider the implications of memset of memory every time you do a realloc because then a performance degradation seems inevitable.

I find the argument about realloc leaving code in the memory is somewhat overly paranoid because the same can be said about normal malloc/calloc/free. It would mean that you would not only need to find all reallocs/malloc/callocs but also any runtime or 3rd party function that internally uses those functions to be really sure that nothing is kept in memory alternatively another way would be to create your own heap and replace it with the regular one to keep it clean.

Upvotes: 2

Eli Algranti
Eli Algranti

Reputation: 9007

First of all I'd like to point out you are not addressing the vulnerability as the memory released by free is not being cleared as well, same as realloc.

Also note your code does more than the old realloc: it throws an exception when out of memory. Which may be futile.

Why is your code slower than realloc? Probably because realloc is using under the hood shortcuts which are not available to you. For example realloc may be allocating more memory than you actually request, or allocating contiguous memory just after the end of the previous block, so your code is doing more memcpy's than realloc.

Point in case. Running the following code in CompileOnline gives result Wow no copy

#include <iostream>
#include <stdlib.h>

using namespace std;

int main()
{

    void* p = realloc(NULL, 1024);
    void* t = realloc(p, 2048);

    if (p == t)
        cout << "Wow no copy" << endl; 
    else
        cout << "Alas, a copy" << endl; 
    return 0;
}

What can you do to make your code faster? You can try to allocate more memory after the currently allocated block, but then freeing the memory becomes more problematic as you need to remember all the pointers you allocated, or find a way to modify the lookup tables used by free to free the correct amount of memory on one go.

OR

Use the common strategy of (internally) allocating twice as much memory as you previously allocated and (optionally) shrink the memory only when the new threshold is less than half the allocated memory.
This gives you some head room so not every time memory grows is it necessary to call malloc/memcpy/free.

Upvotes: 2

c-smile
c-smile

Reputation: 27470

Conceptually realloc() is not doing anything too smart - it allocates memory by some blocks exactly as you do in your ReAllocNew.

The only conceptual difference can be in the way how new block size is calculated.

realloc may use something like this:

int new_buffer_size = old_buffer_size * 2;

and this will decrease number of memory moves from what you have there. In any case I think that block size calculation formula is the key factor.

Upvotes: 1

Related Questions