Mr.C64
Mr.C64

Reputation: 42984

Successive calls to RegGetValue return two different sizes for the same string

In some code I use the Win32 RegGetValue() API to read a string from the registry.

I call the aforementioned API twice:

  1. The purpose of the first call is to get the proper size to allocate a destination buffer for the string.

  2. The second call reads the string from the registry into that buffer.

What is odd is that I found that RegGetValue() returns different size values between the two calls.

In particular, the size value returned in the second call is two bytes (equivalent to one wchar_t) less than the first call.

It's worth noting that the size value compatible with the actual string length is the value returned by the second call (this corresponds to the actual string length, including the terminating NUL).
But I don't understand why the first call returns a size two bytes (one wchar_t) bigger than that.

A screenshot of program output and Win32 C++ compilable repro code are attached.

Different size values returned by RegGetValue()


Repro Source Code

#include <windows.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;


void PrintSize(const char* const message, const DWORD sizeBytes)
{
    cout << message << ": " << sizeBytes << " bytes (" 
         << (sizeBytes/sizeof(wchar_t)) << " wchar_t's)\n";
}


int main()
{
    const HKEY key = HKEY_LOCAL_MACHINE;
    const wchar_t* const subKey = L"SOFTWARE\\Microsoft\\Windows\\CurrentVersion";
    const wchar_t* const valueName = L"CommonFilesDir";

    //
    // Get string size
    //
    DWORD keyType = 0;
    DWORD dataSize = 0;
    const DWORD flags = RRF_RT_REG_SZ;
    LONG result = ::RegGetValue(
        key, 
        subKey,
        valueName, 
        flags, 
        &keyType, 
        nullptr, 
        &dataSize);
    if (result != ERROR_SUCCESS)
    {
        cout << "Error: " << result << '\n';
        return 1;
    }
    PrintSize("1st call size", dataSize);
    const DWORD dataSize1 = dataSize; // store for later use


    //
    // Allocate buffer and read string into it
    //
    vector<wchar_t> buffer(dataSize / sizeof(wchar_t));
    result = ::RegGetValue(
        key, 
        subKey,
        valueName, 
        flags, 
        nullptr, 
        &buffer[0], 
        &dataSize);
    if (result != ERROR_SUCCESS)
    {
        cout << "Error: " << result << '\n';
        return 1;
    }
    PrintSize("2nd call size", dataSize);

    const wstring text(buffer.data());
    cout << "Read string:\n";
    wcout << text << '\n';
    wcout << wstring(dataSize/sizeof(wchar_t), L'*')  << "  <-- 2nd call size\n";
    wcout << wstring(dataSize1/sizeof(wchar_t), L'-') << "  <-- 1st call size\n"; 
}

Operating System: Windows 7 64-bit with SP1


EDIT

Some confusion seems to be arisen by the particular registry key I happened to read in the sample repro code.
So, let me clarify that I read that key from the registry just as a test. This is not production code, and I'm not interested in that particular key. Feel free to add a simple test key to the registry with some test string value.
Sorry for the confusion.

Upvotes: 6

Views: 2394

Answers (2)

Mr.C64
Mr.C64

Reputation: 42984

This blog post (published on February 14th, 2024) clarifies the issue:

The Old New Thing - Functions that return the size of a required buffer generally return upper bounds, not tight bounds

There are a number of functions in Windows that are part of a three-phase operation:

  1. Request the size of a buffer needed to receive some data.
  2. Allocate a buffer of that size.
  3. Call the function again with that buffer.

When you ask for the required size of a buffer, it is not uncommon for the function to return a value that larger than the actual value you get from step 3, when you ask for the data to be placed in the buffer.

[…] Given that the caller has to be prepared for the size to change anyway, the “how big of a buffer do I need” call can return an over-estimate of the required size, since that will allow the second call for the data to succeed (assuming the data hasn’t changed). And giving an over-estimate is often much easier than giving an exact value.

I think the official MSDN documentation should be updated with that information.

Upvotes: 0

Remy Lebeau
Remy Lebeau

Reputation: 597061

RegGetValue() is safer than RegQueryValueEx() because it artificially adds a null terminator to the output of a string value if it does not already have a null terminator.

The first call returns the data size plus room for an extra null terminator in case the actual data is not already null terminated. I suspect RegGetValue() does not look at the real data at this stage, it just does an unconditional data size + sizeof(wchar_t) to be safe.

(36 * sizeof(wchar_t)) + (1 * sizeof(wchar_t)) = 74

The second call returns the real size of the actual data that was read. That size would include the extra null terminator only if one had to be artificially added. In this case, your data has 35 characters in the path, and a real null terminator present (which well-behaved apps are supposed to do), thus the extra null terminator did not need to be added.

((35+1) * sizeof(wchar_t)) + (0 * sizeof(wchar_t)) = 72

Now, with that said, you really should not be reading from the Registry directly to get the CommonFilesDir path (or any other system path) in the first place. You should be using SHGetFolderPath(CSIDL_PROGRAM_FILES_COMMON) or SHGetKnownFolderPath(FOLDERID_ProgramFilesCommon) instead. Let the Shell deal with the Registry for you. This is consistent across Windows versions, as Registry settings are subject to be moved around from one version to another, as well as accounting for per-user paths vs system-global paths. These are the main reasons why the CSIDL API was introduced in the first place.

Upvotes: 11

Related Questions