user1911221
user1911221

Reputation: 13

Regex a binary with GC in PowerShell

Question: What PowerShell regex pattern will return an output like Bash's string command?

I found an article on gc and Select-String: Episode #137: Free-base64-ing. http://blog.commandlinekungfu.com/2011/03/episode-137-free-base64-ing.html

I tried a number of regex patterns from a previous question: Regular Expression for alphanumeric and underscores. Regular Expression for alphanumeric and underscores

If I run in Bash: strings --all myfile.bin Results: 52939 lines of character strings.

gc .\myfile.bin | Select-String -AllMatches "^[a-zA-Z0-9_]*$" Results: a number of blank lines.

gc .\myfile.bin | Select-String -AllMatches "^\w*$" Results: 9 lines of characters and a number of blank lines.

gc .\myfile.bin | Select-String -AllMatches "^\w*$" Results: 9 lines of characters.

gc .\myfile.bin | Select-String -AllMatches "[A-Za-z0-9_]" Results: Pretty much the entire file, unprintable characters and all.

gc .\myfile.bin | Select-String -AllMatches "^[\p{L} \p{Nd}_]+$" Results: 20 lines of characters.

So what's the regex trick that I am missing?

Upvotes: 1

Views: 708

Answers (2)

user265537
user265537

Reputation:

As mentioned, the lack of line breaks will prevent RegEx from working. Microsoft Sysinternals' strings utility is a good solution.

If you need a native PowerShell solution, ping me. I wrote a Get-Strings cmdlet in C# that does ASCII (UTF8) and Unicode (UTF16) string extraction from binaries. It is not as fast as Sysinternals, but does have the advantage of putting the output into the PowerShell pipeline.

Upvotes: 0

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200453

You're missing that binary files don't consist of "lines" in the way text files do. Therefore ^ and $ won't do you any good here.

While arguably not the most elegant solution, something like this might do:

cat .\myfile.bin `
  | % { $_ -replace '[^\w\d ]', "`n" } `
  | % { $_.Split("`n") } `
  | ? { $_ -match '.{3,}' } `
  | % { $_.Trim() }

Or, you could use Sysinternals' strings utility.

Upvotes: 1

Related Questions