Pithikos
Pithikos

Reputation: 20300

Splitting string with regexp in Perl when delimeter appears in the beginning of the string

I want to split a string using regular expressions but I have run into some problem. I have this string:

$text=" one two three";

Then I try to split it into alphabetic words:

#@words=split(" ", $text);          #1 this works

@words=split("[^a-zA-Z]", $text);   #2 this doesn't work

for $word (@words){
    printf "word: |$word|\n";
}

So the commented method(1) works fine. As expected I get printed:

word: |one|
word: |two|
word: |three|

However with the second method(2) I get this:

word: ||
word: |one|
word: |two|
word: |three|

So although logically the second method should be equivalent to the first one, in practice it doesn't behave the same way. Why is that?

Upvotes: 2

Views: 196

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

This is a special case in Perl's split() function.

As stated in perldoc:

split(/PATTERN/, expr, [limit])

If PATTERN is omitted, [it] splits on whitespace (after skipping any leading whitespace).

Empty leading fields are produced when there are positive-width matches at the beginning of the string; [...]

As a special case, specifying a PATTERN of space (' ') will split on white space just as split with no arguments does. Thus, split(' ') can be used to emulate awk's default behavior, whereas split(/ /) will give you as many initial null fields (empty string) as there are leading spaces.

Upvotes: 10

Related Questions