Ronja
Ronja

Reputation: 23

Encoding problem using new line command in perl

The program works as long as I only print the special characters. But I want them to be separated and sorted. With the new line command the characters change into question marks. Can someone tell me why and how to solve this problem?

#!/usr/bin/perl

while (<>) {
  while (/(.)/g) {
    if (ord($1) >= 128){
       print "$1\n";      
    }
  }
}

Upvotes: 2

Views: 147

Answers (2)

Polar Bear
Polar Bear

Reputation: 6798

You use UNIX system, it is not clear what terminal and what settings for LOCALE you have in your environment.

Depending on LOCALE settings not all symbols will be printable to console and you would see ? instead. Some symbols are not intended to be printed at all (control symbols which can not be visualized).

You have two options:

  • adjust LOCALE settings to match characters used
  • re-encode input and output to supported by LOCALE

Also your code probably would be easier to read in following form

use strict;
use warnings;
use feature 'say';

my $debug = 0;

while (<DATA>) {
    chomp;
    say     if $debug;
    map{ my $d = ord; print "[$d]" } split '';
    say ''  if $debug;;
}

__DATA__
use strict;
use warnings;
use feature 'say';

while (<>) {
    say;
    map{ my $d = ord; print "[$d]" if $d >= 128 } split '', $_;
}

Upvotes: 0

choroba
choroba

Reputation: 241858

When opening a non-ASCII file, you should tell Perl what encoding the file has. When printing those characters, again, you should specify how they should be encoded on output.

For example, to process UTF-8 encoded characters, prepend the following to your code:

use open IO => ':encoding(UTF-8)', ':std';

See open for details.

Upvotes: 4

Related Questions