Reputation: 2589
How can I open the file contains CJK
characters using perl script:
use utf8;
use open ':encoding(utf8)';
binmode STDOUT, ':utf8';
The above code I am using to open the file and I couldn't find the CJK
characters and its empty. Input file contains text:
\para[para]Details of the electronic structure of the metal the loss to
electron-hole-pair excitations \characters{刘安雯} only
depends weakly on the metal. The metals exhibit large variation in the
work function, yet the translational in elasticity is similar in all
cases. This suggests electron transfer forming a transient
H\textsuperscript{{\textminus}} is not important. The simulation allows
us to construct \characters{胡水明} a universal sticking
function for H and D on metals, which depends only on the H atom
incidence translational energy and incidence angle as well as the mass of
the solid's atoms.\endp
I am finding this way:
while($str=m/(\p{InCJK_Unified_Ideographs})/xg)
{
print "Char: --> $&\n";
}
Could someone guide where I am doing wrong in my code: Thanks.
Updated:
I don't know but this program works fine and printing the
CJK
characters
use utf8;
my $str = "\characters{刘安雯胡水明}";
while($str=~m/(\p{InCJK_Unified_Ideographs}){1,}/xg) { print ":: $&\n"; }
Upvotes: 0
Views: 67
Reputation: 241808
use utf8;
This line tells Perl that the source code contains UTF-8, so it's not related to reading from a file.
use open ':encoding(utf8)';
This is equivalent to
use open IO => 'encoding(utf8)';
which sets the encoding for input and output streams, i.e. it doesn't change the encoding of standard input and output. To do so, you need to add :std
:
use open IO => ':utf8', ':std';
The last line shown,
binmode STDOUT, ':utf8';
sets the encoding for STDOUT, which would be already covered by the previous line if it used :std
.
You didn't show how you opened the file. If you used <>
or readline without specifying a filehandle, you need to set the encoding for the standard input as shown above. If you used a filehandle, I'm out of ideas - it works for me.
Upvotes: 1