Martin Fric
Martin Fric

Reputation: 730

powershell to unescape unicode (utf8)

I have following function prepared

function UnescapeNonIsoChar($inputString) {
    return [regex]::replace($inputString, '(?:\\u[0-9a-f]{4})+', { 
        param($m) 
        $utf8Bytes = (-split ($m.Value -replace '\\u([0-9a-f]{4})', '0x$1 ')).ForEach([byte])
        [text.encoding]::utf8.GetString($utf8Bytes) 
    })
}

Everythings works fine until i get 2019 \u2019 or something with bigger value than \u0(any 3 values here [0-f])

Then it throws error:

Cannot convert value "0x2019" to type "System.Byte"

Can anybody help me please?

EDIT (added input)

profile.header.profile=\u00e6\u00aa\u0094\u00e6\u00a1\u0088\u00e5\u0090\u008d\u00e7\u00a8\u00b1
profile.header.customer=\u00e5\u00ae\u00a2\u00e6\u0088\u00b6\u00e5\u0090\u008d\u00e7\u00a8\u00b1
profile.header.account=\u00e5\u00b8\u00b3\u00e8\u0099\u009f/\u00e6\u00a2\u009d\u00e4\u00bb\u00b6\u00e4\u00bb\u00a3\u00e7\u00a2\u00bc
profile.header.description=\u00e6\u008f\u008f\u00e8\u00bf\u00b0
layout.msg.updatePrimaryUsersLayout=Kindly save it as a New Layout as Primary user\u2019s layout cannot be updated.

This is somethin that I receive. Point is to convert all escaped chars into readable form. This is stg like translation file. But readable for app, not for user. I need to in one step unescape all chars to readable form. So user can either read it or change it. And then ofc i need to escape it back so it is usable for app.

Thanks

Upvotes: 0

Views: 1718

Answers (1)

JosefZ
JosefZ

Reputation: 30103

Using your sample input:

function UnescapeNonIsoChar($inputString) {
    Try {
        [regex]::replace($inputString, '(?:\\u[0-9a-f]{4})+', { 
            param($m) 
            $utf8Bytes = (-split ($m.Value -replace '\\u([0-9a-f]{4})', '0x$1 ')).ForEach([byte])
            [text.encoding]::utf8.GetString($utf8Bytes) 
        })
    } Catch {
        [regex]::Unescape($inputString)
    }
}

@'
profile.header.profile=\u00e6\u00aa\u0094\u00e6\u00a1\u0088\u00e5\u0090\u008d\u00e7\u00a8\u00b1
profile.header.customer=\u00e5\u00ae\u00a2\u00e6\u0088\u00b6\u00e5\u0090\u008d\u00e7\u00a8\u00b1
profile.header.account=\u00e5\u00b8\u00b3\u00e8\u0099\u009f/\u00e6\u00a2\u009d\u00e4\u00bb\u00b6\u00e4\u00bb\u00a3\u00e7\u00a2\u00bc
profile.header.description=\u00e6\u008f\u008f\u00e8\u00bf\u00b0
layout.msg.updatePrimaryUsersLayout=Kindly save it as a New Layout as Primary user\u2019s layout cannot be updated.
'@ -split [System.Environment]::NewLine |
    ForEach-Object {
        UnescapeNonIsoChar -inputString $_
    }

Output: .\SO\62679444.ps1

profile.header.profile=檔案名稱
profile.header.customer=客戶名稱
profile.header.account=帳號/條件代碼
profile.header.description=描述
layout.msg.updatePrimaryUsersLayout=Kindly save it as a New Layout as Primary user’s layout cannot be updated.

Edit. … help me to do it another way round? That unescapet to escaped form?.

You could take up using the following code snippet:

$Readable = .\SO\62679444.ps1
Import-Namespace -Namespace 'System.Web'
foreach ($line in $Readable) {
    ([char[]]$line | ForEach-Object {
        if ([int]$_ -le 0xFF) { $_ } else {
            [System.Web.HttpUtility]::UrlEncode([string]$_) -replace '%', '\u00'
        }
    }) -join ''
}
profile.header.profile=\u00e6\u00aa\u0094\u00e6\u00a1\u0088\u00e5\u0090\u008d\u00e7\u00a8\u00b1
profile.header.customer=\u00e5\u00ae\u00a2\u00e6\u0088\u00b6\u00e5\u0090\u008d\u00e7\u00a8\u00b1
profile.header.account=\u00e5\u00b8\u00b3\u00e8\u0099\u009f/\u00e6\u00a2\u009d\u00e4\u00bb\u00b6\u00e4\u00bb\u00a3\u00e7\u00a2\u00bc
profile.header.description=\u00e6\u008f\u008f\u00e8\u00bf\u00b0
layout.msg.updatePrimaryUsersLayout=Kindly save it as a New Layout as Primary user\u00e2\u0080\u0099s layout cannot be updated.

(maybe use another conversion conditionally for a line not starting with the profile.header string?)

Upvotes: 2

Related Questions