Reputation: 25
my @folder = ('s,c%','c__pp_p','Monday_øå_Tuesday, Wednesday','Monday & Tuesday','Monday_Tuesday___Wednesday');
if ($folder =~ s/[^\w_*\-]/_/g ) {
$folder =~ s/_+/_/g;
print "$folder : Got %\n" ;
}
Using above code i am not able to handle this "Monday_øå_Tuesday_Wednesday"
The output should be :
s_c
c_pp_p
Monday_øå_Tuesday_Wednesday
Monday_Tuesday
Monday_Tuesday_Wednesday
Upvotes: 1
Views: 513
Reputation: 53478
You can use \W
to negate the \w
character class, but the problem you've got is that \w
doesn't match your non-ascii letters.
So you need to do something like this instead:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @folder = ('s,c%','c__pp_p','Monday_øå_Tuesday, Wednesday','Monday & Tuesday','Monday_Tuesday___Wednesday');
s/[^\p{Alpha}]+/_/g for @folder;
print Dumper \@folder;
Outputs:
$VAR1 = [
's_c_',
'c_pp_p',
'Monday_øå_Tuesday_Wednesday',
'Monday_Tuesday',
'Monday_Tuesday_Wednesday'
];
This uses a unicode property - these are documented in perldoc perluniprop
- but the long and short of it is, \p{Alpha}
is the unicode alphanumeric set, so much like \w
but internationalised.
Although, it does have a trailing _
on the first line. From your description, that seems to be what you wanted. If not, then... it's probably easier to:
s/_$// for @folder;
than make a more complicated pattern.
Upvotes: 2