Reputation: 59
I have a hash of hash of arrays. The keys to the hashes are $duration
and $attr
. I want to sort descending $b <=> $a
and remove only those duplicate values, which have equal duration. In the snippet these should be streams:
'h264/AVC, 1080p24 /1.001 (16:9)' & 'AC3, English, multi-channel, 48kHz'
with duration '26'
but not the duplicate values with $duration '2124'
& '115'
.
There are countless examples for removing duplicates and I've tried everything I could find to implement for my needs but with no success. What should be my approach for the solution. Thanks.
my ( %recordings_by_dur_attr ) = ();
push( @{ $recordings_by_dur_attr{ $duration }{ $attr } }, @stream );
print Data::Dumper->Dump( [\%recordings_by_dur_attr] );
Result:
$VAR1 = {
'2124' => {
'00300.mpls, 00-35-24' => [
'',
'h264/AVC, 480i60 /1.001 (16:9)',
'AC3, English, stereo, 48kHz'
]
},
'50' => {
00021.mpls, 00-00-50' => [
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
]
},
'6528' => {
'00800.mpls, 01-48-48' => [
'',
'Chapters, 18 chapters',
'h264/AVC, 1080p24 /1.001 (16:9)',
'DTS, Japanese, stereo, 48kHz',
'DTS Master Audio, English, stereo, 48kHz',
'DTS, French, stereo, 48kHz',
'DTS, Italian, stereo, 48kHz',
'DTS, German, stereo, 48kHz',
'DTS, Spanish, stereo, 48kHz',
'DTS, Portuguese, stereo, 48kHz',
'DTS, Spanish, stereo, 48kHz',
'DTS, Russian, stereo, 48kHz'
]
},
'26' => {
'01103.mpls, 00-00-26' => [
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
],
'01102.mpls, 00-00-26' => [
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
],
'00011.mpls, 00-00-26' => [
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
]
},
'115' => {
'00304.mpls, 00-01-55' => [
'',
'h264/AVC, 480i60 /1.001 (16:9)',
'AC3, English, stereo, 48kHz'
]
}
};
Duplicate structure
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
Wanted result with removed duplicate structure:
$VAR1 = {
'2124' => {
'00300.mpls, 00-35-24' => [
'',
'h264/AVC, 480i60 /1.001 (16:9)',
'AC3, English, stereo, 48kHz'
]
},
'50' => {
00021.mpls, 00-00-50' => [
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
]
},
'6528' => {
'00800.mpls, 01-48-48' => [
'',
'Chapters, 18 chapters',
'h264/AVC, 1080p24 /1.001 (16:9)',
'DTS, Japanese, stereo, 48kHz',
'DTS Master Audio, English, stereo, 48kHz',
'DTS, French, stereo, 48kHz',
'DTS, Italian, stereo, 48kHz',
'DTS, German, stereo, 48kHz',
'DTS, Spanish, stereo, 48kHz',
'DTS, Portuguese, stereo, 48kHz',
'DTS, Spanish, stereo, 48kHz',
'DTS, Russian, stereo, 48kHz'
]
},
'26' => {
'00011.mpls, 00-00-26' => [
'',
'h264/AVC, 1080p24 /1.001 (16:9)',
'AC3, English, multi-channel, 48kHz'
]
},
'115' => {
'00304.mpls, 00-01-55' => [
'',
'h264/AVC, 480i60 /1.001 (16:9)',
'AC3, English, stereo, 48kHz'
]
}
};
Post processing
for my $duration ( sort { $b <=> $a } keys %recordings_by_dur_attr ) {
for my $attr ( keys $recordings_by_dur_attr{ $duration } ) {
#Remove duplicate structures
my @stream = @{ $recordings_by_dur_attr{ $duration }{ $attr } };
my ( $mpls, $hms ) = ( $attr =~ /(\d+\.mpls), (\d+-\d+-\d+)$/ );
for ( my $i = 1; $i < @stream; $i++ ) {
#extract info from each stream
}
}
}
Upvotes: 0
Views: 1626
Reputation: 385655
The expression $seen{$candidate}++
is useful for finding duplicates. When it returns true, $candidate
has previously been seen. It is most often used as follows:
my @uniq = grep !$seen{$_}++, @list;
Instead of building a list of keys of elements to keep, I inverted the condition to build a list of keys of elements to delete.
sub id { pack 'N/(N/a*)', @{ $_[0] } }
for my $recordings_by_attr (values(%recordings_by_dur_attr)) {
my %seen;
delete @{$recordings_by_attr}{
grep $seen{id($recordings_by_attr->{$_})}++,
sort
keys %$recordings_by_attr
};
}
The sort
decides which of the duplicates to remove. If you don't care which, you can remove the sort
.
Upvotes: 1
Reputation: 1342
Steps:
1. Traverse the hash.
2. if ref $key eq "ARRAY"
then
1. my `@temp = uniq(@{$hash->{$key}})`;
2. $var = undef;
3. $var = \@temp;
Else
1. Traverse the hash.
3. Else
1. next;
Upvotes: 0