Reputation: 734
I'm trying to convert a perl script into a powershell script. I'm having problems with a part of it when the script is reading a log file and has to get the encoding of the file.
Here is the perl code:
sub get_encoding {
my $f = shift;
my $fh;
return "ASCII" if (!open ($fh,"<",$f));
my $b = "";
my $n = read ($fh,$b,2);
close ($fh);
return "UTF-16" if ($b eq "\x{ff}\x{fe}");
return "ASCII";
}
it is called like so:
get_encoding ($l->{file})
Where $l->{file} is a path to the log file.
Can anyone explain what is going on, especially in this line:
return "UTF-16" if ($b eq "\x{ff}\x{fe}");
And if anyone knows a good way to do this in powershell, any tips are much apreciated.
Gísli
Upvotes: 0
Views: 1847
Reputation: 830
The program reads and exams the first 2 bytes of the given file to decide whether it should return string "ASCII" or "UTF-16".
Here are some more detail description:
If the file cannot be opened, for whatever reason, it returns "ASCII". (Weird, but that's what it does.)
return "ASCII" if (!open ($fh,"<",$f));
If the file is opened as file handle $fh
, read($fh, $b, 2)
the first 2 (8-bit) bytes in to variable $b
. The return value of read
, which means the number of bytes actually read, gets stored to the variable $n
, although it is never used latter.
my $b = "";
my $n = read ($fh,$b,2);
The file handle $fh
gets to be close
ed right after the read.
close ($fh);
If the value of $b
is exactly "\x{ff}\x{fe}", the "UTF-16" is returned. Although it would be more exact to return "UTF-16BE". \x{..}
is the representation of bytes by its hex value. Thus there are two bytes in "\x{ff}\x{fe}"
, not 10 or 12.
return "UTF-16" if ($b eq "\x{ff}\x{fe}");
At last, if $b
is not equal to "\x{ff}\x{fe}", "ASCII" is returned.
return "ASCII";
Upvotes: 3
Reputation: 16037
the script read two bytes previously into $b from $f : my $n = read ($fh,$b,2);
the line in question test these two bytes whether they are literally FF and FE
I guess FF, FE is the byte order mark for UTF-16 little endian encoding see here http://unicode.org/faq/utf_bom.html
Upvotes: 1
Reputation: 60918
From http://franckrichard.blogspot.com/2010/08/powershell-get-encoding-file-type.html
function Get-FileEncoding{
[CmdletBinding()] Param (
[Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)] [string]$Path)
[byte[]]$byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $Path
if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )
{ Write-Output 'UTF8' }
elseif
($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)
{ Write-Output 'Unicode' }
elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)
{ Write-Output 'UTF32' }
elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)
{ Write-Output 'UTF7'}
else
{ Write-Output 'ASCII' }}
Upvotes: 1