photosynthesis
photosynthesis

Reputation: 2890

How to ignore a split pattern while the string is to be split by that character

First, sorry for my English and the confusing description in the title.

My problem here is I have multiple lines of natural phrases, I want to count the words contained in it. I have came up with the following regex in Perl:

my @words = split /[ :,.;\s\/\t!"\n]+/, $_;

It works fine except that when encounter with a word like 'U.S.A' it breaks the word into U,S and A, which is undesired. What can I do to fix it? Thanks.

Upvotes: 0

Views: 63

Answers (1)

Miller
Miller

Reputation: 35208

I'd split based off spaces, but then remove any non-word characters from the beginning and end of the "words". That way U.S.A. would end up as U.S.A

use strict;
use warnings;

local $_ = 'hello world, U.S.A., and other places.';

my @words = map { s/^\W+|\W+$//g; $_ } split /\s+/, $_;

use Data::Dump;
dd \@words;

Outputs

["hello", "world", "U.S.A", "and", "other", "places"]

Upvotes: 1

Related Questions