Reputation: 79
Hye..how to remove punctuation?.. Actually I already try using [:punct:]
but it does not work for all punctuation. It just only remove the dot .
only...but other punctuations still have. My task is to remove paragraph, remove punctuation and change all text to lower case..
this is my text file which is snuker.txt
snuker berjaya menarik perhatian kbs.
19981230
Sam Chong"" kiri dan ooi Chin Kay memberi sumbangan besar kepada pembangunan snuker tanah air
dengan merangkul pingat' emas sukan asia ti'ga belas tahun sembi'lan belas sembilan puluh lapan membuka
lembaran baru snuker dan biliard tanah air apabila mereka kian disegani dan berjaya menukar tanggapan.
negatif masyarakat tempatan terhadap sukan itu
and this is my perl script
#!/usr/bin/perl
use utf8;
if(! open(INPUT, '< snuker.txt'))
{
die "cannot opent input file: $!";
}
if(! open(OUTPUT, '> output.txt'))
{
die "cannot open output file: $!";
}
select OUTPUT;
while($lines = <INPUT>)
{
if($lines =~ s/[\s[:punct:]]+$/ /g)
{
print "$lines";
}
}
close INPUT;
close OUTPUT;
close STDOUT;
the output are like this...the other punctuations still have..only .
are gone..
snuker berjaya menarik perhatian kbs 19981230 Sam Chong"" kiri dan ooi Chin Kay memberi sumbangan besar kepada pembangunan snuker tanah air dengan merangkul pingat' emas sukan asia ti'ga belas tahun sembi'lan belas sembilan puluh lapan membuka lembaran baru snuker dan biliard tanah air apabila mereka kian disegani dan berjaya menukar tanggapan negatif masyarakat tempatan terhadap sukan itu
Upvotes: 0
Views: 3622
Reputation: 185189
Try doing this :
#!/usr/bin/perl
use strict; use warnings;
$/ = ""; # read file per paragraphs
while (<>) {
s/\p{Punct}//g;
s/(?:\n|\s+)/ /g;
print lc($_);
}
perl script.pl < test_file > output.txt
snuker berjaya menarik perhatian kbs 19981230 sam chong kiri dan ooi chin kay memberi sumbangan besar kepada pembangunan snuker tanah air dengan merangkul pingat emas sukan asia tiga belas tahun sembilan belas sembilan puluh lapan membuka lembaran baru snuker dan biliard tanah air apabila mereka kian disegani dan berjaya menukar tanggapan negatif masyarakat tempatan terhadap sukan itu
Upvotes: 0
Reputation:
#!/usr/bin/perl
use utf8;
if(! open(INPUT, '< test_file'))
{
die "cannot opent input file: $!";
}
if(! open(OUTPUT, '> output.txt'))
{
die "cannot open output file: $!";
}
select OUTPUT;
while($lines = <INPUT>)
{
$lines =~ s/\n/ /g;
$lines =~ s/[[:punct:]]//g;
print lc("$lines");
}
close INPUT;
close OUTPUT;
close STDOUT;
Upvotes: 0
Reputation: 241908
Remove the dollar sign from the regex. It makes your pattern match only at the end of a line.
Upvotes: 6