Dr. Faust
Dr. Faust

Reputation: 1021

Should Perl hashes always contain values?

I had an earlier question that received the following response from the noted Perl expert, Perl author and Perl trainer brian d foy:

[If] you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:
    my( $extension ) = $filename =~ m/\.([^.]+)$/;
    if( exists $hash{$extension} ) { ... }
You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.

Thanks for the advice brian.

What I now want to know is what is the best practice in a case like the above. Should one only define the keys, which is all I need to achieve what's described above, or should one always define a value as well?

Upvotes: 8

Views: 6319

Answers (6)

daotoad
daotoad

Reputation: 27183

Storing '1' in a Set-hash Considered Harmful

I know using Considered Harmful is considered harmful, but this is bad, almost as bad as unrestrained goto usage.

Ok, I've harped on this in a few comments, but I think I need a full response to demonstrate the issue.

Let's say we have a daemon process that provides back-end inventory control for a shop that sells widgets.

my @items = qw(
    widget
    thingy
    whozit
    whatsit
);

my @items_in_stock = qw(
    widget
    thingy
);

my %in_stock;
my @in_stock(@items_in_stock) = (1) x @items_in_stock;  #initialize all keys to 1

sub Process_Request {
    my $request = shift;

    if( $request eq REORDER ) {
        Reorder_Items(\@items, \%in_stock);
    }
    else { 
        Error_Response( ILLEGAL_REQUEST );
    }
}

sub Reorder_Items{
   my $items = shift;
   my $in_stock =  shift;

   # Order items we do not have in-stock.
   for my $item ( @$items ) {
       
       Reorder_Item( $item ) 
           if not exists $in_stock->{$item};
   }

}

The tool is great, it automatically keeps items in stock. Very nice. Now, the boss asks for automatically generated catalogs of in-stock items. So we modify Process_Request() and add catalog generation.

sub Process_Request {
    my $request = shift;

    if( $request eq REORDER ) {
        Reorder_Items(\@items, \%in_stock);
    }
    if( $request eq CATALOG ) {
        Build_Catalog(\@items, \%in_stock);
    }
    else { 
        Error_Response( ILLEGAL_REQUEST );
    }
}

sub Build_Catalog {
    my $items = shift;
    my $in_stock = shift;

    my $catalog_response = '';
    foreach my $item ( @$items ) {
        $catalog_response .= Catalog_Item($item)
            if $in_stock->{$item};
    }

    return $catalog_response;
} 

In testing, Build_Catalog() works fine. Hooray, we go live with the app.

Oops. For some reason nothing is being ordered, the company runs out of stock of everything.

The Build_Catalog() routine adds keys to %in_stock, so Reorder_Items() now sees everything as in stock and never makes an order.

Using Hash::Util's lock_hash can help prevent accidental hash modification. If we locked %in_stock before calling Build_Catalog() we would have gotten a fatal error and would never have gone live with the bug.

In summary, it is best to test existence of keys rather than truth of your set-hash values. If you are using existence as a signifier, don't set your values to '1' because that will mask bugs and make them harder to track. Using lock_hash can help catch these problems.

If you must check for the truth of the values, do so in every case.

Upvotes: 1

Dan
Dan

Reputation:

Using undef as a value in hash is more memory efficient than storing 1.

Upvotes: 2

Chas. Owens
Chas. Owens

Reputation: 64919

As others have said, the idiomatic solution for a hashset (a hash that only contains keys, not values) is to use 1 as the value because this makes the testing for existence easy. However, there is something to be said for using undef as the value. It will force the users to test for existence with exists which is marginally faster. Of course, you could test for existence with exists even when the value is 1 and avoid the inevitable bugs from users who forget to use exists.

Upvotes: 3

Michael Carman
Michael Carman

Reputation: 30831

It's usually preferable to set a defined value for every key. The idiomatic value (when you don't care about the value) is 1.

my %hash = map { $_ => 1 } @array;

Doing it this way makes the code the uses the hash slightly simpler because you can use $hash{key} as a Boolean value. If the value can be undefined you need to use the more verbose exists $hash{key} instead.

That said, there are situations where a value of undef is desirable. For example: imagine that you're parsing C header files to extract preprocessor symbols. It would be logical to store these in a hash of name => value pairs.

#define FOO 1
#define BAR

In Perl, this would map to:

my %symbols = ( FOO => 1, BAR => undef);

In C a #define defines a symbol, not a value -- "defined" in C is mapped to "exists" in Perl.

Upvotes: 7

innaM
innaM

Reputation: 47829

undef is a value.

Of course, stuff like that is always depndent on what you are currently doing. But $foo{bar} is just a variable like $bar and I don't see any reason why either one should not be undef every now and then.

PS: That's why exists exists.

Upvotes: 3

Jeremy Wall
Jeremy Wall

Reputation: 25237

You can't create a hash key without a value. The value can be undef but it will be there. How else would you construct a hash. Or was your question regarding whether the value can be undef? In which case I would say that the value you store there (undef, 1, 0...) is entirely up to you. If a lot of folks are using it then you probably want to store some true value though incase some one else uses if ($hash{$extension}) {...} instead of exists because they weren't paying attention.

Upvotes: 5

Related Questions