PKV
PKV

Reputation: 177

Perl read .DAT file with UTF-8 BOM format and write it with UTF-8 format without BOM

I have a .DAT file with CR LF and UTF-8 format with BOM, I'm trying to convert it to CR LF UTF-8 format without BOM using Perl. I'm currently using the following code to do so and eve though the output file is generated without the BOM, the header is not included in the file with rest of the data. My requirement is to get the final output file in UTF-8 format without BOM and header included with the rest of the data.

use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding

sub encodeWithoutBOM
{
    my $src = $_[1];
    my $des = $_[2];
    my @array;
    open(SRC,'<',$src) or die $!;
    # open destination file for writing
    open(DES,'>',$des) or die $!;
    print("copying content from $src to $des\n");
    while(<SRC>){
         @array = <SRC>;    
    }
    foreach (@array){
    print DES;
    }
    close(SRC);
    close(DES); 
} 

Upvotes: 1

Views: 532

Answers (2)

Shawn
Shawn

Reputation: 52374

Another option is to use File::BOM from CPAN, which lets you transparently handle the byte order mark:

#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use File::BOM qw/open_bom/;

sub encode_without_bom {
    my ($src, $dst) = @_;

    open_bom(my $infile, $src, ":encoding(UTF-8)");
    open my $outfile, ">:utf8", $dst;
    say "Copying from $src to $dst";
    while (<$infile>) {
        print $outfile $_;
    }
}

encode_without_bom "input.txt", "output.txt";

Upvotes: 2

ikegami
ikegami

Reputation: 385799

use open ':std', ':encoding(UTF-8)';

while (<>) {
   s/^\N{BOM}// if $. == 1;
   print;
}

Upvotes: 2

Related Questions