Reputation: 15

Perl: Request for improvement my REGEX (match only with positive/negative integers/decimals and commas)

Its hard to describe what I would do. So I show it on example.

My string:

my $string = q(
min_entry = -0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764
max_entry=0.236, 0.382, 0.500, 0.618, 0.764, 1.000
#jakis komentarz
rsi_confirm= 25,27,30, 32
slope3 = 0.236, 0.382, 0.5, 0.764
min_tp=0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1
interval = 14

[thresholds]
low = 40
high = 40
persistence = 9

My match pattern:

my @match = $string =~ /(([\d-\.]+[, ]+)+[\d-\.]+)/sg;
print Dumper \@match;

My results:

$VAR1 = [
          '-0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764',
          '0.618, ',
          '0.236, 0.382, 0.500, 0.618, 0.764, 1.000',
          '0.764, ',
          '25,27,30, 32',
          '30, ',
          '0.236, 0.382, 0.5, 0.764',
          '0.5, ',
          '0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1',
          '0.0764, '
        ];

I dont know why or how elemens with index 1( value '0.618, ',), 3 (value '0.764, ',), 5, 7, 9 are added with my regex. But I dont need it.

Result I would like to achieve:

print Dumper \@match;
$VAR1 = [
          '-0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764',
          '0.236, 0.382, 0.500, 0.618, 0.764, 1.000',
          '25,27,30, 32',
          '0.236, 0.382, 0.5, 0.764',
          '0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1',
        ]

Answer please base on my regex. The only repeating string identifying characters are "=" or "= " (before pattern) and "," (in the middle of the pattern)

Upvotes: 1

Answers (3)

Borodin

Reputation: 126742

At a guess, this string is the contents of a file that you have read in its entirety to make things "easier". Unfortunately it means that you must explicitly cater for newline characters, which complicates things a lot

Here's an example of what I would do using the DATA file handle. Buliding @aoa is reduced to a single statement. Of course you may open a file and use the handle from that instead

Mistakes in your code have caused lines with only a single number (and no comma) to be ignored. It's possible that you need that behaviour, but I have "fixed" it here

use strict;
use warnings 'all';

my @aoa = map { /-?\d+(?:\.\d+)?/g } <DATA>;

use Data::Dumper;
print Dumper \@aoa;

__DATA__
min_entry = -0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764
max_entry=0.236, 0.382, 0.500, 0.618, 0.764, 1.000
#jakis komentarz
rsi_confirm= 25,27,30, 32
slope3 = 0.236, 0.382, 0.5, 0.764
min_tp=0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1
interval = 14

[thresholds]
low = 40
high = 40
persistence = 9

output

I also suspect that even this is not the best solution to your problem as you are discarding all the data labels so you have no idea which list of numbers belongs to which category except by position

This alternative builds a hash of arrays so that the value is retained

use strict;
use warnings 'all';

my %data;

while ( <DATA> ) {
     next unless /=/;
     my ($key, @values) = /[-\w.]+/g;
     $data{$key} = \@values;
}

use Data::Dumper;

print Dumper \%data;


__DATA__
min_entry = -0.236, 0, 0.236 , 0.382, 0.500, 0.618, 0.764
max_entry=0.236, 0.382, 0.500, 0.618, 0.764, 1.000
#jakis komentarz
rsi_confirm= 25,27,30, 32
slope3 = 0.236, 0.382, 0.5, 0.764
min_tp=0.0125 , 0.0236, 0.0382, 0.05, 0.0764, 0.1
interval = 14

[thresholds]
low = 40
high = 40
persistence = 9

output

$VAR1 = {
          'high' => [
                      '40'
                    ],
          'interval' => [
                          '14'
                        ],
          'slope3' => [
                        '0.236',
                        '0.382',
                        '0.5',
                        '0.764'
                      ],
          'persistence' => [
                             '9'
                           ],
          'low' => [
                     '40'
                   ],
          'min_tp' => [
                        '0.0125',
                        '0.0236',
                        '0.0382',
                        '0.05',
                        '0.0764',
                        '0.1'
                      ],
          'min_entry' => [
                           '-0.236',
                           '0',
                           '0.236',
                           '0.382',
                           '0.500',
                           '0.618',
                           '0.764'
                         ],
          'max_entry' => [
                           '0.236',
                           '0.382',
                           '0.500',
                           '0.618',
                           '0.764',
                           '1.000'
                         ],
          'rsi_confirm' => [
                             '25',
                             '27',
                             '30',
                             '32'
                           ]
        };

This is the best I can do for you without understanding the full problem

Upvotes: 1

Chris Turner

Reputation: 8142

Rather than using capture groups, you want to use clustering to group those parts of your regex together. Clustering is done by doing (?:whatever) rather than (whatever) so your code would become...

my @match = $string =~ /(?:(?:[\d-\.]+[, ]+)+[\d-\.]+)/sg;

Upvotes: 1

jjmerelo

Reputation: 23537

You have two parentheses groups, one inside the other. The inner one is yielding every second result. You should use a non-capturing group for the inner grouping.

Upvotes: 1

Perl: Request for improvement my REGEX (match only with positive/negative integers/decimals and commas)

Answers (3)

output

output

Related Questions