Gordon
Gordon

Reputation: 6873

registry value & write to XML, encoding issues?

I am Using PowerShell to gather some data from the Uninstall key of the registry and write to XML, and everything works right up until what needs to be written includes some simplified Chinese characters. When I look at the registry itself, the value of DisplayName is

Object Enabler for AutoCAD Plant 3D 2023 - 简体中文 (Simplified Chinese)

But when I use

Write-Host "$($uninstallKey.GetValue('DisplayName'))"

I get

Object Enabler for AutoCAD Plant 3D 2023 - 简体中文 (Simplified Chinese) EE- 88}

Not sure where that EE- 88} is coming from, and what else might be hiding there. At first, I thought my issue was with the encoding of the XML file at write. I had been using [System.Text.UTF8Encoding] which throws an error

Exception calling "Save" with "1" argument(s): "'.', hexadecimal value 0x00, is an invalid character."

But now I think the problem is elsewhere, since a Write-Host shows something different from what I see in the registry itself.

I am using

$localMachineHive = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::LocalMachine, 0)
$uninstallKey = $localMachineHive.OpenSubKey("$uninstallKeyPath\$uninstallKeyName")

to access the registry, where "$uninstallKeyPath\$uninstallKeyName" defines the key path (x64 or x32) to the individual key. I recently moved to this approach because it is much faster than PS native registry access. But perhaps there is an encoding nuance there that I am missing? Or is this a place where Write-Host is the problem?

EDIT: Verified the mechanism for accessing the registry isn't the issue. These both produce the same output, that doesn't match what I see in RegEdit.

$localMachineHive = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::LocalMachine, 0)
$uninstallKey = $localMachineHive.OpenSubKey("SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{BF3F377C-AF47-33EE-979F-67D4EFA9FAB0}")
Write-Host "$($uninstallKey.GetValue('DisplayName'))"

$displayName = Get-ItemPropertyValue -Path 'Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{BF3F377C-AF47-33EE-979F-67D4EFA9FAB0}' -Name DisplayName
Write-Host "$displayName"

Upvotes: 2

Views: 335

Answers (1)

zett42
zett42

Reputation: 27786

This looks like an error of the data stored in the registry, probably a mismatch between the actual string length and the number of bytes passed per the cbData parameter of RegSetValueEx() (the native API for writing registry values).

If the program that wrote the registry value passed an argument for cbData that is too large, then it could actually store data beyond the actual string data in the registry (whatever happens to be in memory after the intended data, which could be just random "garbage" and worst case confidential data like passwords).

When PowerShell reads the registry value, it gets the null terminator character and any additional characters, which might appear as random characters. Note that RegEdit doesn't show these characters.

Workaround

Remove all characters starting from the null terminator character up to the end of the string:

# Using a RegEx to remove the first null character and any following characters
$displayName -replace '\0.*'

# alternatively:
($displayName -split ([char] 0), 2)[0]

Repro

Trying to actually reproduce the problem, I've created a bogus C++ console application:

#include <windows.h>

struct Test {
    wchar_t const user[7] = L"MyUser";
    wchar_t const password[6] = L"MyPwd";
};

int main()
{
    // Create or open a registry key
    HKEY regKey = nullptr;
    ::RegCreateKeyExW( HKEY_CURRENT_USER, L"_TestKey", 0,  nullptr, 0, KEY_READ | KEY_WRITE, nullptr, &regKey, nullptr );

    // Attempt to write the string member data.value, but pass a value for cbData 
    // that is twice the number of actual bytes
    Test data;
    ::RegSetValueExW( regKey, L"FooBar", 0, REG_SZ, reinterpret_cast<BYTE const*>( &data.user ), sizeof( data.user ) * 2 );
}

By passing twice the actual number of bytes for cbData, the code unintentionally writes the value of the password member after the intended value of the user member into the registry, separated by a null character.

PowerShell code that reads the value:

$hkcu = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::CurrentUser, 0)
$regkey = $hkcu.OpenSubKey('_TestKey')
$regkey.GetValue('FooBar')

Output:

MyUserMyPwd

Note that PowerShell strips the null terminator between "MyUser" and "MyPwd" from the output, but if you read the registry value into a variable, the null terminator will be there.


Bonus Code

Out of curiosity, I wrote a script that lists all registry string values that contain embedded null characters (excluding REG_MULTI_SZ values, which may contain embedded null characters by design).

Example:

.\Get-RegStringsWithEmbeddedNull.ps1 -Hive LocalMachine -View Registry64 -EA Ignore
.\Get-RegStringsWithEmbeddedNull.ps1 -Hive LocalMachine -View Registry32 -EA Ignore

On my machine, this lists over 500 values! In many cases the difference in length between the stored string and the actual string (trimmed using -replace '\0.*') is only 1 character (so only an extra null is stored), which makes it especially hard for the unsuspecting developer to diagnose problems when working with such values, because PowerShell doesn't display embedded null characters. The only way to diagnose these off-by-1 errors is by looking at the Length property of the string.

Conclusion:

In general it is a good idea to trim any registry string value of type REG_SZ and REG_EXPAND_SZ at the first null character. There might be cases where embedded null characters are actually intended, but these are rare and against the spec (developer should have choosen REG_MULTI_SZ instead). Most cases seem to be caused by programmer errors. The C APIs are easy to use incorrectly, as some expect you to pass the character count, others expect that you include the null terminator and others require you to pass the buffer size (in characters or even in bytes), which might be larger than the actual string length.

Upvotes: 2

Related Questions