Reputation: 94920
I have a file that contains parameters using this syntax
RANGE {<value> | <value>-<value>} [ , ...]
where value
s are numbers.
for example, all these are valid syntax
RANGE 34
RANGE 45, 234
RANGE 2-99
RANGE 3-7, 15, 16, 2, 54
How can I parse the values to an array in Perl?
For example for the last example, I want my array to have 3, 4, 5, 6, 7, 15, 16, 2, 54
. The ordering of elements does not matter.
The most basic way is to check for a -
symbol to determine whether there is a range or not, parse the range using a loop and then parse the rest of the elements
my @arr;
my $fh, "<", "file.txt" or die (...);
while (<$fh>) {
if ($_ =~ /RANGE/) {
if ($_ =~ /-/) { # parse the range
< how do I parse the lower and upper limits? >
for($lower..$upper) {
$arr[++$#arr] = $_;
}
} else { # parse the first value
< how do I parse the first value? >
}
# parse the rest of the values after the comma
< how do I parse the values after the comma? >
}
}
I need help parsing the numbers. For parsing, one way I can think of is to use successive splits (on -
, ,
and ). Is there some better (clean and elegant, using regex maybe?) way?
Also, comments/suggestions on the overall program structure are welcome.
Upvotes: 3
Views: 1018
Reputation: 29790
Here's my effort:
sub parse_range {
my $str = shift;
return unless $str =~ /^RANGE /g;
my @array;
while ($str =~ / \G \s* ( \d+ ) ( - ( \d+ ) ) ? \s* (?: , | $ ) /gxc) {
push @array, $2 ? $1 .. $3 : $1;
}
return $str =~ /\G$/ ? @array : ();
}
It returns an empty list if the string parameter doesn't conform to the basic format you laid out.
Upvotes: 0
Reputation: 118645
I like using Perl's range and ||
operators for a problem like this:
map { my($x,$y)=split/-/; $x..$y||$x } split /\s*,\s*/;
If the token contains a -
, the split/-/
statement will set both $x
and $y
and add the range from $x
to $y
to the map
output. Otherwise, it will just set $x
and just add $x
to the output.
Upvotes: 2
Reputation: 118158
Along the same lines as other answers:
#!/usr/bin/perl
use strict; use warnings;
my $number = '[0-9]+';
my $range = "$number(:?-$number)?";
my $ranges = "$range(:?, $range)*";
my $pattern = qr/^RANGE ($ranges)$/;
while ( my $range = <DATA> ) {
next unless $range =~ $pattern;
my $expanded = expand_ranges($1);
print "@$expanded\n\n";
}
sub expand_ranges {
my ($ranges) = @_;
my @terms = split /, /, $ranges;
my @expanded;
for my $term ( @terms ) {
my ($lo, $hi) = split /-/, $term;
push @expanded, defined( $hi ) ? $lo .. $hi : $lo .. $lo;
}
return \@expanded;
}
__DATA__
RANGE 34
RANGE 45, 234
RANGE 2-99
RANGE 3-7, 15, 16, 2, 54
Output:
34 45 234 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3 1 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 3 4 5 6 7 15 16 2 54
Upvotes: 1
Reputation: 53986
I would suggest parsing the line into a separate variable, as $_
tends to get clobbered by other function calls. You can remove the trailing newline at the same time, with chomp
.
while (<$fh)>
{
chomp (my $line = $_);
# ...
}
Next, you need to detect the 'RANGE' indicator, and extract the numbers that follow. If there is no such indicator, you can just skip to the next line:
next if $line !~ /^RANGE (.*)$/;
Now, you can start extracting the numbers, splitting on the comma delimiter:
my @ranges = split /, /, $1;
Now you can extract the dashes and translate those into ranges. This is the tricky part -- if the value has a dash in it, get the first and second numbers, and turn them into a range with the ..
operator; otherwise, leave the number alone:
@ranges = map { /(\d+)-(\d+)/ ? ($1 .. $2) : $_ } @ranges;
Putting all that together, and combining expressions, gives us:
my @numbers;
while (<$fh)>
{
chomp (my $line = $_);
next if $line !~ /^RANGE (.*)$/;
push @numbers, map { /(\d+)-(\d+)/ ? ($1 .. $2) : $_ } (split /, /, $1);
}
Upvotes: 4
Reputation: 139611
Filter duplicates with a hash:
#! /usr/bin/perl
use warnings;
use strict;
use 5.10.0;
my @tests = (
"RANGE 34",
"RANGE 45, 234",
"RANGE 2-99",
"RANGE 3-7, 15, 16, 2, 54",
);
for (@tests) {
my %hits;
@hits{$1 .. $2 // $1} = ()
while /(\d+)(?:-(\d+))?/g;
my @array = sort { $a <=> $b } keys %hits;
print "@array\n";
}
Output:
34 45 234 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 2 3 4 5 6 7 15 16 54
Upvotes: 1
Reputation: 213005
What about this?
First split the line into elements separated by values and then check whether there is a '-' sign to create ranges:
if ($line =~ /RANGE ([\d\,\- ]+)/) {
my $paramtxt = $1;
my @elements = split(/\,/, $paramtxt);
for my $element (@elements) {
if ($element =~ /(\d+)\-(\d+)/) {
$lower = $1;
$upper = $2;
push @arr, $lower .. $upper;
} elsif ($element =~ /(\d+)/) {
$solo = $1;
push @arr, $solo;
}
}
}
Upvotes: 3
Reputation: 28733
Take a look at Text::NumericList
module from CPAN. It can convert strings to array in similar way you need:
use Text::NumericList;
my $list = Text::NumericList->new;
$list->set_string('1-3,5-7');
my @array = $list->get_array; # Returns (1,2,3,5,6,7)
You can at least look at its source code for ideas.
Upvotes: 5