DavidS
DavidS

Reputation: 33

Using Perl Regex, how can I capture the position of the hyphens in a hyphenated word?

I'm trying to capture the position of all of the hyphens in a hyphenated word, so that I can load a hash with the position of those hyphens (in the text, not the word). At the moment, I'm trying a capture group inside of a non-capture group... but it's only capturing the last hyphen.

my $word           = shift (@_);
my $word_start_pos = shift (@_);
my $text           = shift (@_);

my $dash_pos  = 0;
my $exp       = 0;
my $pos       = 0;
my $test_char = '';
   
if ($word =~ /^(?:[\p{L&}0-9\.\'\/]{1,}([\-])){7,}[\p{L&}0-9\.\'\/]{1,}$/) {
   foreach $exp (1..$#-) {
      $pos = $-[$exp];
      $dash_pos = $word_start_pos + $pos;
      $test_char = substr($text, $dash_pos, 1);
      if ($test_char =~ /^[\-]$/) {
         &load_changes('-', $dash_pos, 'Dash', ' ', 'Replace');
      }
   }
}

Upvotes: 0

Views: 116

Answers (3)

ikegami
ikegami

Reputation: 386541

push @pos, $-[0] while /-/g;

Demo:

$ perl -Mv5.14 -ne'my @pos; push @pos, $-[0] while /-/g; say "@pos";'
abc-de-fgh
3 6
-----
0 1 2 3 4

In context, you could replace

foreach $exp (1..$#-) {
   $pos = $-[$exp];
   ...
}

with

while ( $word =~ /-/g ) {
   my $pos = $-[0];
   ...
}

Upvotes: 3

user3408541
user3408541

Reputation: 71

Its probably not the best idea to manually reset the pos variable. Basically what this does is make a global match kind of jump around back and forth in a string instead of processing it from beginning to end.

This is a straightforward use of the @- and @+ arrays, which work in conjunction with the pos value.

perldoc -v @-
@-      This array holds the offsets of the beginnings of the last
        successful match and any capture buffers it contains. 
        <cut>

perldoc -v @+
@+      This array holds the offsets of the ends of the last successful
        match and any matching capture buffers that the pattern
        contains.
        <cut>

perldoc -f pos
pos     Returns the offset of where the last "m//g" search left off for
        the variable in question ($_ is used when the variable is not
        specified).
        <cut>

@+ is equivalent to the pos value of each match. It sounds like you want the @- value.

Here is the code...

#!/usr/bin/perl

my $s = q"here-is-a-string-with-a-lot-of-hyphens";
my @hyphenStack;

while($s=~/-/g){
  push(@hyphenStack, $-[0]);
}
for(@hyphenStack){
  print "$_\n";
}

Output looks like this...

$ perl find.hyphens.pl
4
7
9
16
21
23
27
30

Golfed at 54 characters

$ perl -e 'for(@ARGV){print"\n$_\n";while(/-/g){print "$-[0]\n"}}' test-1 test-2 test-3-4-5-6 here-is-a-string-with-a-lot-of-hyphens "here is a string with-spaces-and-hyphens"

test-1
4

test-2
4

test-3-4-5-6
4
6
8
10

here-is-a-string-with-a-lot-of-hyphens
4
7
9
16
21
23
27
30

here is a string with-spaces-and-hyphens
21
28
32

Upvotes: -1

Dave Sherohman
Dave Sherohman

Reputation: 46225

I don't see any benefit to using regexes for this task. It's not a tool that's at all suited to the job. Here are two alternate approaches which are much simpler and more efficient:

#!/usr/bin/env perl    

use strict;
use warnings;
use 5.010;

my $str = 'Reg-ex-es are total over-kill for this search-task.';

say '--- using index ---';

my $last = -1;
while (my $pos = index($str, '-', $last)) {
  last if $pos == -1;
  say $pos;
  $last = $pos + 1;
} 

say '--- using split ---';
my @chars = split '', $str;
for my $pos (0 .. $#chars) {
  say $pos if $chars[$pos] eq '-';
}

Output:

--- using index ---
3
6
24
45
--- using split ---
3
6
24
45

(Note that the positions are 0-based.)

Upvotes: 1

Related Questions