Jeffrey Strong
Jeffrey Strong

Reputation: 11

Remove UTF-16 BOM from string with Perl

I'm looking for the correct syntax to remove the BOM from a UTF-16 text file I have successfully done it for UTF-8. Please see below for syntax I have tried:

$readline =~ s/^\N{ZERO WIDTH NO-BREAK SPACE}//;
$readline =~ s/^\N{BYTE ORDER MARK}//;
$readline =~ s/^\N{BOM}//;
$readline =~ s/^\x{FEFF}//;
$readline =~ s/^\0x{FEFF}//;
$readline =~ s/^\x{FE}\x{FF}//;
$readline =~ s/^\xFE\xFF//;
$readline =~ s/^\0xFE\0xFF//;

As you can see these are repetitive but I was trying anything I could find. To open the file I used the encoding function. Any help would be greatly appreciated.

Upvotes: 1

Views: 1260

Answers (1)

ikegami
ikegami

Reputation: 385590

What's in $readline?

If you have UTF-16be,

s/^\xFE\xFF//

If you have UTF-16le,

s/^\xFF\xFE//

If you have Unicode Code Points (decoded text),

s/^\x{FEFF}//
s/^\N{BOM}//

Alternatively, you can also use File::BOM to both remove the mark and decode the stream.

Upvotes: 5

Related Questions