Reputation: 51
I have a file with aprox 12,000
lines generated every 6
hours. On some of these lines, there are non-ascii characters.
I would like to be able to run a Perl script to remove all lines that have non-ASCII characters in it.
Upvotes: 4
Views: 1462
Reputation: 455272
You can do:
perl -i.bak -ne 'print unless(/[^[:ascii:]]/)' file
Regex explanation for /[^[:ascii:]]/
:
/
start of regular expression
[
start of character class
^
make this a negative character class (a class that matches anything besides what is listed)
[:ascii:]
any ASCII character
]
end of character class
/
end of regular expression
Upvotes: 6
Reputation: 80423
#!/usr/bin/perl -p
END {close STDOUT}
use 5.010;
use utf8;
use strict;
use autodie;
use warnings qw<FATAL all>;
use open qw<IN :bytes OUT :encoding(US-ASCII) :std>;
BEGIN {$SIG{__WARN__}=sub{confess}}
use sigtrap qw<stack-trace normal-signals error-signals>;
use Carp;
"disconcertingly";
Upvotes: 1