neversaint
neversaint

Reputation: 63984

How can I partition a Perl array into equal sized chunks?

I have a fixed-sized array where the size of the array is always in factor of 3.

my @array = ('foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5);

How can I cluster the member of array such that we can get an array of array group by 3:

$VAR = [ ['foo','bar','qux'],
         ['foo1','bar','qux2'],
         [3, 4, 5] ];

Upvotes: 18

Views: 14844

Answers (9)

user2449151
user2449151

Reputation: 83

Use the spart function from the List::NSect package on CPAN.

    perl -e '
    use List::NSect qw{spart};
    use Data::Dumper qw{Dumper};
    my @array = ("foo", "bar", "qux", "foo1", "bar", "qux2", 3, 4, 5);
    my $var = spart(3, @array);
    print Dumper $var;
    '

    $VAR1 = [
          [
            'foo',
            'bar',
            'qux'
          ],
          [
            'foo1',
            'bar',
            'qux2'
          ],
          [
            3,
            4,
            5
          ]
        ];

Upvotes: 3

Brad Gilbert
Brad Gilbert

Reputation: 34120

As a learning experience I decided to do this in Perl6

The first, perhaps most simplest way I tried was to use map.

my @output := @array.map: -> $a, $b?, $c? { [ $a, $b // Nil, $c // Nil ] };
.say for @output;
foo bar qux
foo1 bar qux2
3 4 5

That didn't seem very scalable. What if I wanted to take the items from the list 10 at a time, that would get very annoying to write. ... Hmmm I did just mention "take" and there is a keyword named take lets try that in a subroutine to make it more generally useful.

sub at-a-time ( Iterable \sequence, Int $n where $_ > 0 = 1 ){
  my $is-lazy = sequence.is-lazy;
  my \iterator = sequence.iterator;

  # gather is used with take
  gather loop {
    my Mu @current;
    my \result = iterator.push-exactly(@current,$n);

    # put it into the sequence, and yield
    take @current.List;

    last if result =:= IterationEnd;
  }.lazy-if($is-lazy)
}

For kicks let's try it against an infinite list of the fibonacci sequence

my $fib = (1, 1, *+* ... *);
my @output = at-a-time( $fib, 3 );
.say for @output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)

Notice that I used $fib instead of @fib. It was to prevent Perl6 from caching the elements of the Fibonacci sequence.
It might be a good idea to put it into a subroutine to create a new sequence everytime you need one, so that the values can get garbage collected when you are done with them.
I also used .is-lazy and .lazy-if to mark the output sequence lazy if the input sequence is. Since it was going into an array @output it would have tried to generate all of the elements from an infinite list before continuing onto the next line.


Wait a minute, I just remembered .rotor.

my @output = $fib.rotor(3);

.say for @output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)

.rotor is actually far more powerful than I've demonstrated.

If you want it to return a partial match at the end you will need to add a :partial to the arguments of .rotor.

Upvotes: 3

Eugene Yarmash
Eugene Yarmash

Reputation: 149736

Another generic solution, non-destructive to the original array:

use Data::Dumper;

sub partition {
    my ($arr, $N) = @_; 

    my @res;
    my $i = 0;

    while ($i + $N-1 <= $#$arr) {
        push @res, [@$arr[$i .. $i+$N-1]];
        $i += $N; 
    }   

    if ($i <= $#$arr) {
        push @res, [@$arr[$i .. $#$arr]];
    }   
    return \@res;
}

print Dumper partition(
    ['foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5], 
    3   
);

The output:

$VAR1 = [
          [
            'foo',
            'bar',
            'qux'
          ],
          [
            'foo1',
            'bar',
            'qux2'
          ],
          [
            3,
            4,
            5
          ]
        ];

Upvotes: 3

user1981275
user1981275

Reputation: 13372

Below a more generic solution to the problem:

my @array = ('foo', 'bar', 1, 2);
my $n = 3;
my @VAR = map { [] } 1..$n;
my @idx = sort map { $_ % $n } 0..$#array;

for my $i ( 0..$#array ){
        push @VAR[ $idx[ $i ] ], @array[ $i ];
}

This also works when the number of items in the array is not a factor of 3. In the above example, the other solutions with e.g. splice would produce two arrays of length 2 and one of length 0.

Upvotes: 1

daotoad
daotoad

Reputation: 27183

I really like List::MoreUtils and use it frequently. However, I have never liked the natatime function. It doesn't produce output that can be used with a for loop or map or grep.

I like to chain map/grep/apply operations in my code. Once you understand how these functions work, they can be very expressive and very powerful.

But it is easy to make a function to work like natatime that returns a list of array refs.

sub group_by ($@) {
    my $n     = shift;
    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, [ splice @array, 0, $n ] while @array;

    return @groups;
}

Now you can do things like this:

my @grouped = map [ reverse @$_ ],
              group_by 3, @array;

** Update re Chris Lutz's suggestions **

Chris, I can see merit in your suggested addition of a code ref to the interface. That way a map-like behavior is built in.

# equivalent to my map/group_by above
group_by { [ reverse @_ ] } 3, @array;

This is nice and concise. But to keep the nice {} code ref semantics, we have put the count argument 3 in a hard to see spot.

I think I like things better as I wrote it originally.

A chained map isn't that much more verbose than what we get with the extended API. With the original approach a grep or other similar function can be used without having to reimplement it.

For example, if the code ref is added to the API, then you have to do:

my @result = group_by { $_[0] =~ /foo/ ? [@_] : () } 3, @array;

to get the equivalent of:

my @result = grep $_->[0] =~ /foo/,
             group_by 3, @array;

Since I suggested this for the sake of easy chaining, I like the original better.

Of course, it would be easy to allow either form:

sub _copy_to_ref { [ @_ ] }

sub group_by ($@) {
    my $code = \&_copy_to_ref;
    my $n = shift;

    if( reftype $n eq 'CODE' ) {
        $code = $n;
        $n = shift;
    }

    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, $code->(splice @array, 0, $n) while @array;

    return @groups;
}

Now either form should work (untested). I'm not sure whether I like the original API, or this one with the built in map capabilities better.

Thoughts anyone?

** Updated again **

Chris is correct to point out that the optional code ref version would force users to do:

group_by sub { foo }, 3, @array;

Which is not so nice, and violates expectations. Since there is no way to have a flexible prototype (that I know of), that puts the kibosh on the extended API, and I'd stick with the original.

On a side note, I started with an anonymous sub in the alternate API, but I changed it to a named sub because I was subtly bothered by how the code looked. No real good reason, just an intuitive reaction. I don't know if it matters either way.

Upvotes: 6

Brad Gilbert
Brad Gilbert

Reputation: 34120

my @VAR;
push @VAR, [ splice @array, 0, 3 ] while @array;

or you could use natatime from List::MoreUtils

use List::MoreUtils qw(natatime);

my @VAR;
{
  my $iter = natatime 3, @array;
  while( my @tmp = $iter->() ){
    push @VAR, \@tmp;
  }
}

Upvotes: 36

DVK
DVK

Reputation: 129373

Another answer (a variation on Tore's, using splice but avoiding the while loop in favor of more Perl-y map)

my $result = [ map { [splice(@array, 0, 3)] } (1 .. (scalar(@array) + 2) % 3) ];

Upvotes: 5

Tore Busch
Tore Busch

Reputation: 51

Or this:

my $VAR;
while( my @list = splice( @array, 0, 3 ) ) {
    push @$VAR, \@list;
}

Upvotes: 5

Adrian Pronk
Adrian Pronk

Reputation: 13906

Try this:

$VAR = [map $_ % 3 == 0 ? ([ $array[$_], $array[$_ + 1], $array[$_ + 2] ]) 
                        : (),
            0..$#array];

Upvotes: 4

Related Questions