Ωmega
Ωmega

Reputation: 43673

Remove array element duplicates at any level of hash?

Having complex multi-level hash where some values are arrays and other are not, how can I remove array element duplicates at any level of such hash?

Just simple hash example (in reality it is way more complex):

$VAR1 = {
  'alpha' => {
    'code' => [
      {
        'x' => 1,
        'y' => 2
      },
      {
        'x' => 1,
        'y' => 2
      }
    ],
    'data' => {
      'domestic' => [
        {
          'a' => 0,
          'b' => 5
        },
        {
          'a' => 0,
          'b' => 5
        }
      ]
    }
  }
}

Hash contains array at different levels and some of them have uniq elements, some of them contains duplicates. Sometimes such array element is complex hash itself.

What is the right way to remove duplicates of any size at any level?

Upvotes: 1

Views: 467

Answers (3)

Horus
Horus

Reputation: 1175

I'm not a fan of deep objects that aren't objectified, and fortunately, Moose has coercion built in so that you can objectify a deep object almost like magic.

I went a bit overboard, but I decided to go ahead and just jot this up as practice for myself, although I think I could have 'roled' a few items and gotten much better results, or forced the coercion for Alpha::Keyed to build the result classes from a required field, regardless.

I don't fully like the way I coded this, but I didn't want to spend a ton of time on it, but it works for the object that you have above. You'd have to do a lot of work to make it go on a more complex object, and you'll want to break up the code into separate classes:

Alpha.pm:

package Alpha;

use Moose;
use Moose::Util::TypeConstraints;

subtype 'AlphaCodes',
    as 'Alpha::Codes';

subtype 'AlphaData',
    as 'Alpha::Data';

coerce 'AlphaCodes',
    from 'ArrayRef[HashRef]',
    via { Alpha::Codes->new( data => $_ ) };

coerce 'AlphaData',
    from 'HashRef',
    via { Alpha::Data->new($_) };

has 'code' => (
    is => 'ro',
    isa => 'AlphaCodes',
    required => 1,
    coerce => 1);

has 'data' => (
    is => 'ro',
    isa => 'AlphaData',
    required => 1,
    coerce => 1);

package Alpha::Codes;

use Moose;
use Moose::Util::TypeConstraints;

extends 'Alpha::KeyedList';

subtype 'ArrayRefOfCodes',
    as 'ArrayRef[Alpha::Code]';

coerce 'ArrayRefOfCodes',
    from 'ArrayRef[HashRef]',
    via { [ map { Alpha::Code->new($_) } @$_ ] };

has 'data' => (
    is => 'ro',
    isa => 'ArrayRefOfCodes',
    required => 1,
    coerce => 1);

package Alpha::KeyedList;

use Moose;
use Moose::Util::TypeConstraints;

sub unique_list {
    my $self = shift;
    my %seen = ();
    my @retval = ();
    foreach my $item ( @{$self->data} ) {
        unless ( $seen{$item->key} ) {
            push(@retval,$item);
            $seen{$item->key} = 1;
        }
    }
    return @retval;
}

package Alpha::Data;

use Moose;
use Moose::Util::TypeConstraints;

subtype 'AlphaDataDomestics',
    as 'Alpha::Data::Domestics';

coerce 'AlphaDataDomestics',
    from 'ArrayRef[HashRef]',
    via { Alpha::Data::Domestics->new(data => $_) };

has 'domestic' => (
    is => 'ro',
    isa => 'AlphaDataDomestics',
    required => 1,
    coerce => 1 );

package Alpha::Data::Domestics;

use Moose;
use Moose::Util::TypeConstraints;

extends 'Alpha::KeyedList';


subtype 'ArrayRefOfDomestics',
    as 'ArrayRef[Alpha::Data::Domestic]';

coerce 'ArrayRefOfDomestics',
    from 'ArrayRef[HashRef]',
    via { [ map { Alpha::Data::Domestic->new($_) } @$_ ] };

has 'data' => (
    is => 'ro',
    isa => 'ArrayRefOfDomestics',
    required => 1,
    coerce => 1);

package Alpha::Data::Domestic;

use Moose;

extends 'Alpha::Keyed';

has 'a' => ( is => 'ro' , isa => 'Str' , required => 1 );
has 'b' => ( is => 'ro' , isa => 'Str' , required => 1 );

sub build_key {
    my $self=  shift;
    return $self->a . '__' . $self->b;
}

package Alpha::Code;

use Moose;

extends 'Alpha::Keyed';

has 'x' => ( is => 'ro' , isa => 'Str' , required => 1 );
has 'y' => ( is => 'ro' , isa => 'Str' , required => 1 );

sub build_key {
    my $self=  shift;
    return $self->x . '__' . $self->y;
}

package Alpha::Keyed;

use Moose;

has 'key' => ( is => 'ro'
    , isa => 'Str'
    , builder => 'build_key'
    , lazy => 1 );

package main;

my $VAR1 = {
  'alpha' => {
    'code' => [
      {
        'x' => 1,
        'y' => 2
      },
      {
        'x' => 1,
        'y' => 2
      }
    ],
    'data' => {
      'domestic' => [
        {
          'a' => 0,
          'b' => 5
        },
        {
          'a' => 0,
          'b' => 5
        },
        {
          'a' => 1,
          'b' => 2
        },
      ]
    }
  }
};

my $alpha = Alpha->new($VAR1->{alpha});

use Data::Dumper;
warn Dumper([ $alpha->code->unique_list ]);
warn Dumper([ $alpha->data->domestic->unique_list ]);

1;

Now for the run:

$VAR1 = [
      bless( {
               'y' => 2,
               'x' => 1,
               'key' => '1__2'
             }, 'Alpha::Code' )
    ];
$VAR1 = [
      bless( {
               'a' => 0,
               'b' => 5,
               'key' => '0__5'
             }, 'Alpha::Data::Domestic' ),
      bless( {
               'a' => 1,
               'b' => 2,
               'key' => '1__2'
             }, 'Alpha::Data::Domestic' )
    ];

Upvotes: 1

Borodin
Borodin

Reputation: 126722

This code uses the Data::Compare module and seems to do what you need.

It traverses the data structure recursively, and every array it comes to is examined for duplicates using the Compare function from the module. Duplicates are removed as they are found.

use strict;
use warnings;

use Data::Compare 'Compare';

my %data = (
  alpha => {
    code => [{ x => 1, y => 2 }, { x => 1, y => 2 }],
    data => { domestic => [{ a => 0, b => 5 }, { a => 0, b => 5 }] },
  },
);

process_node(\%data);

use Data::Dump;
dd \%data;

sub process_node {

  my ($data) = @_;

  if (ref $data eq 'HASH') {
    process_node($_) for values %$data;
  }
  elsif (ref $data eq 'ARRAY') {

    my $i = 0;
    while ($i < @$data-1) {
      my $j = $i + 1;
      while ($j < @$data) {
        if (Compare(@{$data}[$i,$j])) {
          splice @$data, $j, 1;
        }
        else {
          $j++;
        }
      }
      $i++;
    }

    process_node($_) for @$data;
  }
}

output

{
  alpha => {
    code => [{ x => 1, y => 2 }],
    data => { domestic => [{ a => 0, b => 5 }] },
  },
}

Upvotes: 1

Ilion
Ilion

Reputation: 6872

I would see the answer to the question here: How can I compare arrays in Perl?

Using that you should be able to iterate through all levels of your hash and compare the arrays in the array level. You would of course need to do it for each possible pairing of arrays.

If you could better assign keys to your arrays so that it some how identified them then you wouldn't need to worry about this as each key needs to be unique.

Upvotes: 0

Related Questions