Karolsweats
Karolsweats

Reputation: 11

Why won't PowerShell script find character?

I have a PowerShell script that runs daily and is used to filter through all the files in a folder as there are 2000+ and do a find and replace of a character and replace with a linebreak. Character shows as, up arrow character in notepad, an FF character in notepad++

I have images below as well

$filename = Get-ChildItem "C:\Scripts\*filename*.*"
$filename | % {
    (gc $_) -replace "","`n`f" | Set-Content $_.fullname
}

As seen, in the code block it doesn't show the arrow, but as text it does. I can do a manual find and replace but when it runs the PowerShell script from the task schedule it doesn't pick anything up to replace it seems. Is there a different way of going about this?

Any help is appreciated!

Code Snippet Screenshot

Upvotes: 0

Views: 526

Answers (3)

js2010
js2010

Reputation: 27423

Here's my guess as to what's going on. Replacing formfeed 0C "`f" with carriage return, linefeed 0D 0A "`r`n" (windows text).

"hi`fhow are you`f" | set-content file.txt -NoNewline
format-hex file.txt


           Path: C:\users\admin\foo\file.txt

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   68 69 0C 68 6F 77 20 61 72 65 20 79 6F 75 0C     hi.how are you.


(get-content file.txt) -replace "`f","`r`n" | set-content file2.txt -NoNewline
format-hex file2.txt


           Path: C:\users\admin\foo\file2.txt

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   68 69 0D 0A 68 6F 77 20 61 72 65 20 79 6F 75 0D  hi..how are you.
00000010   0A                                               .

Upvotes: 1

Darin
Darin

Reputation: 2368

Okay, I think I figured out what character you are having problems with. It appears to be 0x0C. I used HxD to edit a text file and placed all the characters from 0x00 to 0x1F in there and found 0x0C was the fat up arrow.

I've rewrote this code to what should work. But if it doesn't then try Keith's answer and replace the "\u2191" with "\u000C", or if that doesn't work, maybe try "\u0C" in his code. Not sure how RegEx works with unicode, but it should be something like this.

(gc $_) -replace "$([char]0x0C)","`n" | Set-Content $_.fullname

EDIT:

In the comments below mklement0 pointed out that this should be the Form Feed character. That being the case, then this version should work.

(gc $_) -replace "`f","`n" | Set-Content $_.fullname

But if none these work, then don't forget to try Keith's version - and try both the "`f" and the Unicode "\u000C" variations with his code.

Upvotes: 0

Keith Langmead
Keith Langmead

Reputation: 1157

From what I can see in tests, by default Get-Content doesn't default to getting content in Unicode format, so I'd guess it's defaulting to ASCII (but don't know for sure).

So in your script you'll want to specify the encoding to force it to use that. I'd also suggest referencing the specific character via it's UNICODE number rather than the symbol, that way you don't need to worry about the format of the script file, or the editor you're using. The following should do what you need (or at least does on my machine).

$filename = Get-ChildItem "C:\Scripts\unicode.txt"
$filename | % {
    (gc -Encoding utf8 $_) -replace "\u2191","`n`r" | Set-Content $_.fullname
}

Upvotes: 0

Related Questions