theuserid01
theuserid01

Reputation: 59

How to remove duplicate values from hash of arrays with 2 references

I have a hash of hash of arrays. The keys to the hashes are $duration and $attr. I want to sort descending $b <=> $a and remove only those duplicate values, which have equal duration. In the snippet these should be streams:

'h264/AVC, 1080p24 /1.001 (16:9)' & 'AC3, English, multi-channel, 48kHz' with duration '26' but not the duplicate values with $duration '2124' & '115'.

There are countless examples for removing duplicates and I've tried everything I could find to implement for my needs but with no success. What should be my approach for the solution. Thanks.

my ( %recordings_by_dur_attr ) = ();

push( @{ $recordings_by_dur_attr{ $duration }{ $attr } }, @stream );

print Data::Dumper->Dump( [\%recordings_by_dur_attr] );

Result:

$VAR1 = {
      '2124' => {
                  '00300.mpls, 00-35-24' => [
                                              '',
                                              'h264/AVC, 480i60 /1.001 (16:9)',
                                              'AC3, English, stereo, 48kHz'
                                            ]
                },
      '50' => {
                00021.mpls, 00-00-50' => [
                                            '',
                                            'h264/AVC, 1080p24 /1.001 (16:9)',
                                            'AC3, English, multi-channel, 48kHz'
                                          ]
              },
      '6528' => {
                  '00800.mpls, 01-48-48' => [
                                              '',
                                              'Chapters, 18 chapters',
                                              'h264/AVC, 1080p24 /1.001 (16:9)',
                                              'DTS, Japanese, stereo, 48kHz',
                                              'DTS Master Audio, English, stereo, 48kHz',
                                              'DTS, French, stereo, 48kHz',
                                              'DTS, Italian, stereo, 48kHz',
                                              'DTS, German, stereo, 48kHz',
                                              'DTS, Spanish, stereo, 48kHz',
                                              'DTS, Portuguese, stereo, 48kHz',
                                              'DTS, Spanish, stereo, 48kHz',
                                              'DTS, Russian, stereo, 48kHz'
                                            ]
                },
      '26' => {
                '01103.mpls, 00-00-26' => [
                                            '',
                                            'h264/AVC, 1080p24 /1.001 (16:9)',
                                            'AC3, English, multi-channel, 48kHz'
                                          ],
                '01102.mpls, 00-00-26' => [
                                            '',
                                            'h264/AVC, 1080p24 /1.001 (16:9)',
                                            'AC3, English, multi-channel, 48kHz'
                                          ],
                '00011.mpls, 00-00-26' => [
                                            '',
                                            'h264/AVC, 1080p24 /1.001 (16:9)',
                                            'AC3, English, multi-channel, 48kHz'
                                          ]
              },
      '115' => {
                 '00304.mpls, 00-01-55' => [
                                             '',
                                             'h264/AVC, 480i60 /1.001 (16:9)',
                                             'AC3, English, stereo, 48kHz'
                                           ]
               }
    };

Duplicate structure

 '',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'

Wanted result with removed duplicate structure:

$VAR1 = {
      '2124' => {
                  '00300.mpls, 00-35-24' => [
                                              '',
                                              'h264/AVC, 480i60 /1.001 (16:9)',
                                              'AC3, English, stereo, 48kHz'
                                            ]
                },
      '50' => {
                00021.mpls, 00-00-50' => [
                                            '',
                                            'h264/AVC, 1080p24 /1.001 (16:9)',
                                            'AC3, English, multi-channel, 48kHz'
                                          ]
              },
      '6528' => {
                  '00800.mpls, 01-48-48' => [
                                              '',
                                              'Chapters, 18 chapters',
                                              'h264/AVC, 1080p24 /1.001 (16:9)',
                                              'DTS, Japanese, stereo, 48kHz',
                                              'DTS Master Audio, English, stereo, 48kHz',
                                              'DTS, French, stereo, 48kHz',
                                              'DTS, Italian, stereo, 48kHz',
                                              'DTS, German, stereo, 48kHz',
                                              'DTS, Spanish, stereo, 48kHz',
                                              'DTS, Portuguese, stereo, 48kHz',
                                              'DTS, Spanish, stereo, 48kHz',
                                              'DTS, Russian, stereo, 48kHz'
                                            ]
                },
      '26' => {
                '00011.mpls, 00-00-26' => [
                                            '',
                                            'h264/AVC, 1080p24 /1.001 (16:9)',
                                            'AC3, English, multi-channel, 48kHz'
                                          ]
              },
      '115' => {
                 '00304.mpls, 00-01-55' => [
                                             '',
                                             'h264/AVC, 480i60 /1.001 (16:9)',
                                             'AC3, English, stereo, 48kHz'
                                           ]
               }
    };

Post processing

for my $duration ( sort { $b <=> $a } keys %recordings_by_dur_attr ) {
   for my $attr ( keys $recordings_by_dur_attr{ $duration }  ) {

       #Remove duplicate structures

        my @stream = @{ $recordings_by_dur_attr{ $duration }{ $attr } };
        my ( $mpls, $hms ) = ( $attr =~ /(\d+\.mpls), (\d+-\d+-\d+)$/ );
        for ( my $i = 1;  $i < @stream; $i++ ) {

        #extract info from each stream

        }
    }
}

Upvotes: 0

Views: 1626

Answers (2)

ikegami
ikegami

Reputation: 385655

The expression $seen{$candidate}++ is useful for finding duplicates. When it returns true, $candidate has previously been seen. It is most often used as follows:

my @uniq = grep !$seen{$_}++, @list;

Instead of building a list of keys of elements to keep, I inverted the condition to build a list of keys of elements to delete.

sub id { pack 'N/(N/a*)', @{ $_[0] } }

for my $recordings_by_attr (values(%recordings_by_dur_attr)) {
   my %seen;
   delete @{$recordings_by_attr}{
       grep $seen{id($recordings_by_attr->{$_})}++,
        sort
         keys %$recordings_by_attr
   };
}

The sort decides which of the duplicates to remove. If you don't care which, you can remove the sort.

Upvotes: 1

Krishnachandra Sharma
Krishnachandra Sharma

Reputation: 1342

Steps:

    1. Traverse the hash.
    2. if ref $key eq "ARRAY"
       then 
       1. my `@temp = uniq(@{$hash->{$key}})`;
       2. $var = undef;
       3. $var = \@temp;
       Else
       1. Traverse the hash.
    3. Else
       1. next;

Upvotes: 0

Related Questions