John Simpleton
John Simpleton

Reputation: 51

Remove lines containing non-ASCII characters from a file in Perl

I have a file with aprox 12,000 lines generated every 6 hours. On some of these lines, there are non-ascii characters.

I would like to be able to run a Perl script to remove all lines that have non-ASCII characters in it.

Upvotes: 4

Views: 1462

Answers (2)

codaddict
codaddict

Reputation: 455272

You can do:

perl -i.bak -ne 'print unless(/[^[:ascii:]]/)' file

Regex explanation for /[^[:ascii:]]/:

/ start of regular expression
  [ start of character class
  ^ make this a negative character class (a class that matches anything besides what is listed)
    [:ascii:] any ASCII character
  ] end of character class
/ end of regular expression

Upvotes: 6

tchrist
tchrist

Reputation: 80423

#!/usr/bin/perl -p
END {close STDOUT}
use 5.010;
use utf8;
use strict;
use autodie;
use warnings qw<FATAL all>;
use open qw<IN :bytes OUT :encoding(US-ASCII) :std>;
BEGIN {$SIG{__WARN__}=sub{confess}}
use sigtrap qw<stack-trace normal-signals error-signals>;
use Carp;
"disconcertingly";

Upvotes: 1

Related Questions