Agargara
Agargara

Reputation: 932

Regex to find the number of extra spaces, including trailing and leading spaces

I'm trying to count the number of extra spaces, including trailing and leading spaces in a string. There are a lot of suggestions out there, but none of them get the count exactly right.

Example ( _ indicates space)

__this is a string__with extra spaces__

should match 5 extra spaces.

Here's my code:

if (my @matches = $_[0] =~ m/(\s(?=\s)|(?<=\s)\s)|^\s|\s$/g){
    push @errors, {
        "error_count" => scalar @matches,
        "error_type"  =>  "extra spaces",
    };
}

The problem with this regex is that it counts spaces in the middle twice. However, if I take out one of the look-ahead/look-behind matches, like so:

$_[0] =~ m/\s(?=\s)|^\s|\s$/g

It won't count two extra spaces at the beginning of a string. (My test string would only match 4 spaces.)

Upvotes: 1

Views: 280

Answers (3)

AdrianHHH
AdrianHHH

Reputation: 14047

With three simple regular expressions (and replacing spaces with underscores for clarity) you could use:

use strict;
use warnings;

my $str = "__this_is_a_string__with_extra_underscores__";

my $temp = $str;

$temp =~ s/^_+//;
$temp =~ s/_+$//;
$temp =~ s/__+/_/g;

my $num_extra_underscores = (length $str) - (length $temp);

print "The string '$str' has $num_extra_underscores extraunderscores\n";

Upvotes: 0

Andrew Cheong
Andrew Cheong

Reputation: 30273

Try

$_[0] =~ m/^\s|(?<=\s)\s|\s(?=\s*$)/g

This should match

  1. the first space (if one exists),
  2. each space that follows a space,
  3. and that one trailing space that immediately follows the last non-space (the rest of the trailing spaces are already counted by the second case).

In other words, for your example, here's what each of the three cases would match:

__this is a string _with extra spaces__
12                 2                 32

This also works for the edge case of all spaces:

_____
12222

Upvotes: 2

OGHaza
OGHaza

Reputation: 4795

This regex should match all unnecessary individual spaces

^( )+|( )(?= )|( )+$

or

$_[0] =~ m/^( )+|( )(?= )|( )+$/g

You could change the spaces to \s but then it'll count tabs as well.

Working on RegexPal

Breakdown:

^( )+ Match any spaces connected to the start of the line

( )(?= ) Match any spaces that are immediately followed by another space

( )+$ Match any spaces connected to the end of the line

Upvotes: 0

Related Questions