user1357713
user1357713

Reputation:

Regex (perl) that will match (globally) one or more spaces occurring other than at the beginning of the line

My understanding is that /[^\A] +/mg will match globally one or more spaces occurring other than at the beginning of the string or just after newline.

Apparently, I'm wrong.

#!/usr/bin/env perl
use strict;
use warnings;

my $str = "         word1     word2\n     word3     word4     word5\n";
print "str before = $str\n";
$str =~ s/[^\A] +/ /mg;
print "str after  = $str\n";

Output:

str before =          word1     word2
     word3     word4     word5

str after  =  word word2 word word word5

The desired output is:

str before =          word1     word2
     word3     word4     word5

str after  =          word1 word2
     word3 word4 word5

So the leading spaces are preserved in number but multiple spaces occurring after the beginning of each line are reduced to a single space.

I'm not finding what I'm looking for in perldoc perlretut nor perldoc perlre (even after searching through all the instances of "[^" with /\[\^). Many thanks, in advance.

Upvotes: 0

Views: 2920

Answers (3)

Ωmega
Ωmega

Reputation: 43683

In Perl, the most simple solution is: s/\S\K +/ /g;

See this demo.

Upvotes: 4

Borodin
Borodin

Reputation: 126732

As m.buettner says, a regex like [...] is a character class and contain only characters, not patterns. In fact your code generates the warning

Unrecognized escape \A in character class

But a string of spaces that's not at the start of the line is a string of spaces preceded by a non-space, so all you need to write is this.

use strict;
use warnings;

my $str = "         word1     word2\n     word3     word4     word5\n";

print qq(String before = "$str"\n);

$str =~ s/[^ ]\K +/ /g;

print qq(String after = "$str"\n);

output

String before = "         word1     word2
     word3     word4     word5
"
String after = "  word1 word2
 word3 word4 word5
"

Upvotes: 0

Martin Ender
Martin Ender

Reputation: 44269

I think you cannot use \A in a character class, since it is not a character. You could go with two negative lookaheads though:

$str =~ s/(?<!^)(?<! ) +/ /mg;

That makes sure that the match can neither start after the beginning of a line nor after another space. The latter condition is important, otherwise if you have multiple spaces at the beginning of a line, the regex would simply start matching from the second one.

By the way, to increase readability when using literal space characters in regular expressions, a neat trick is to wrap them in a character class:

$str =~ s/(?<!^)(?<![ ])[ ]+/ /mg;

Working demo.

Upvotes: 2

Related Questions