AlekseyHoffman
AlekseyHoffman

Reputation: 2694

Node.js + Powershell: how to encode Unicode characters when using stdin.write

This question is related to this issue: Powershell: how to execute command for a path containing Unicode characters?

I have a Node.js app that spawns a single child process with Powershell 5.1, and then re-uses it to run different commands since it's faster than spawning a separate process.

Problem

The problem is, commands containing Unicode characters are failing silently.

Code

let childProcess = require('child_process')
let testProcess = childProcess.spawn('powershell', [])
testProcess.stdin.setEncoding('utf-8')

testProcess.stdout.on('data', (data) => {
  console.log(data.toString())
})

testProcess.stdout.on('error', (error) => {
  console.log(error)
})

// This path is working, I get command output in the console:
// testProcess.stdin.write("(Get-Acl 'E:/test.txt').access\n");

// This path is not working. I get nothing in the console
testProcess.stdin.write("(Get-Acl 'E:/test 📚.txt').access\n");

Edit #1

I've tried to encode the paths to UTF-8 on the Node.js side before sending the command to Powershell and then casting it to System.Char:

const path = 'E:/test $([char]0x1f4da).txt'
const command = `Get-Acl $(${path}).access`
testProcess.stdin.write(`${command}\n`)

but I'm not sure how to do it properly. It seems like I'm not encoding it to the correct format. And it's not really a proper solution either, I just encoded the emoji to utf manually. I would probably need to convert the whole path to UTF-16 or something to ensure there's no unsupported characters in it:

"E:/test 📚.txt".split("").reduce((hex,c) => hex += c.charCodeAt(0).toString(16).padStart(4,"0"),"")

Not sure it would even work

Upvotes: 3

Views: 1230

Answers (1)

mklement0
mklement0

Reputation: 440112

Try the following:

let childProcess = require('child_process')

let testProcess = childProcess.spawn(
  'chcp 65001 >NUL & powershell.exe -NonInteractive -NoProfile -Command -', 
  { shell: true }
)

testProcess.stdout.on('data', (data) => {
  console.log(data.toString())
})

testProcess.stdout.on('error', (error) => {
  console.log(error)
})

testProcess.stdin.write("Get-Item '📚.txt'\n");
  • While Node.js itself defaults to UTF-8, console applications spawned from it typically use the system's active OEM code page, such as Code page 437 on US-English systems, which is typically a fixed single-byte limited to 256 characters that lacks support for most Unicode characters.

  • powershell.exe, the Windows PowerShell CLI is no exception, so in order to make it interpret its stdin input as UTF-8, the OEM code page must explicitly set to the UTF-8 code page, 65001, before powershell.exe is launched.

  • Thus, { shell : true } is used to ensure that powershell.exe is launched via cmd.exe, the default shell on Windows, which allows executing chcp 65001 first, which performs the switch to the UTF-8 code page.

    • Note: This switch to UTF-8 as the OEM code page also affects subsequent calls to console applications in the same process.
  • Additionally:

    • -NonInteractive is used to tell PowerShell that no user interactions are expected in the session, which notably prevents loading of the PSReadLine module used for command-line editing, which can cause problems with Unicode characters outside the BMP, i.e. characters with a code point higher than 0xFFFF (such as 📚), which require two [char] instances in .NET.

    • -NoProfile prevents loading (dot-sourcing) of the PowerShell profile files, given that they're (a) typically only needed in interactive sessions and (b) their loading not only slows things down, but can have side effects.

    • -Command - tells PowerShell to read commands from stdin; while omitting this parameter somewhat behaves similarly, it is the equivalent of -File -, which exhibits pseudo-interactive behavior.

Upvotes: 4

Related Questions