PNVR
PNVR

Reputation: 1

how to read files which is having japanese context using perl script?

i want to search for a word(which in Japanese) in some group of files which is having Japanese context.

i tried to do as normal files, but i am getting error like wide character in print at line no --print statement line.

i used

   use Unicode::Japanese;
   use Unicode::Japanese qw(PurePerl);

as given in some websites.

here is the code what i am using

    my $dr="My_Directory" ;    
    opendir DIR, $dr ;    
    my @txtfiles=grep { /\.txt$/ } readdir(DIR) ;    
    foreach $file(@txtfiles)    
    {       
        my $count=0;    
        my @words=();
        open(FILE, $dr.$file);
        while (<FILE>) 
        {
            push(@words, split(/\s+/));                
        } 
        foreach $word (@words) 
        {
            if($word=~ m/$word_to_search/i) 
            {                        
                $count++;                   
            }        
        }
        print "$word_to_search occurs $count times in $file file\n";  
    }

any ideas will be great helpful.

Thanks in Advance.

PNVR

Upvotes: 0

Views: 869

Answers (1)

daxim
daxim

Reputation: 39158

Please first read http://p3rl.org/UNI and apply the advice given there. The topic of encoding has come many times on Stack Overflow already, this is not specific to Japanese at all. (Google, SO tags, SO search)

You mention that you saved a file as UTF-8. To get you started quickly, this is a way to read one:

open my $fh, '<:encoding(UTF-8)', 'filename.txt';

Upvotes: 3

Related Questions