Reputation: 1021
I had an earlier question that received the following response from the noted Perl expert, Perl author and Perl trainer brian d foy:
[If] you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:my( $extension ) = $filename =~ m/\.([^.]+)$/; if( exists $hash{$extension} ) { ... }You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.
Thanks for the advice brian.
What I now want to know is what is the best practice in a case like the above. Should one only define the keys, which is all I need to achieve what's described above, or should one always define a value as well?
Upvotes: 8
Views: 6319
Reputation: 27183
I know using Considered Harmful is considered harmful, but this is bad, almost as bad as unrestrained goto
usage.
Ok, I've harped on this in a few comments, but I think I need a full response to demonstrate the issue.
Let's say we have a daemon process that provides back-end inventory control for a shop that sells widgets.
my @items = qw(
widget
thingy
whozit
whatsit
);
my @items_in_stock = qw(
widget
thingy
);
my %in_stock;
my @in_stock(@items_in_stock) = (1) x @items_in_stock; #initialize all keys to 1
sub Process_Request {
my $request = shift;
if( $request eq REORDER ) {
Reorder_Items(\@items, \%in_stock);
}
else {
Error_Response( ILLEGAL_REQUEST );
}
}
sub Reorder_Items{
my $items = shift;
my $in_stock = shift;
# Order items we do not have in-stock.
for my $item ( @$items ) {
Reorder_Item( $item )
if not exists $in_stock->{$item};
}
}
The tool is great, it automatically keeps items in stock. Very nice. Now, the boss asks for automatically generated catalogs of in-stock items. So we modify Process_Request()
and add catalog generation.
sub Process_Request {
my $request = shift;
if( $request eq REORDER ) {
Reorder_Items(\@items, \%in_stock);
}
if( $request eq CATALOG ) {
Build_Catalog(\@items, \%in_stock);
}
else {
Error_Response( ILLEGAL_REQUEST );
}
}
sub Build_Catalog {
my $items = shift;
my $in_stock = shift;
my $catalog_response = '';
foreach my $item ( @$items ) {
$catalog_response .= Catalog_Item($item)
if $in_stock->{$item};
}
return $catalog_response;
}
In testing, Build_Catalog()
works fine. Hooray, we go live with the app.
Oops. For some reason nothing is being ordered, the company runs out of stock of everything.
The Build_Catalog()
routine adds keys to %in_stock
, so Reorder_Items()
now sees everything as in stock and never makes an order.
Using Hash::Util's lock_hash
can help prevent accidental hash modification. If we locked %in_stock
before calling Build_Catalog()
we would have gotten a fatal error and would never have gone live with the bug.
In summary, it is best to test existence of keys rather than truth of your set-hash values. If you are using existence as a signifier, don't set your values to '1' because that will mask bugs and make them harder to track. Using lock_hash
can help catch these problems.
If you must check for the truth of the values, do so in every case.
Upvotes: 1
Reputation: 64919
As others have said, the idiomatic solution for a hashset (a hash that only contains keys, not values) is to use 1 as the value because this makes the testing for existence easy. However, there is something to be said for using undef
as the value. It will force the users to test for existence with exists
which is marginally faster. Of course, you could test for existence with exists
even when the value is 1 and avoid the inevitable bugs from users who forget to use exists
.
Upvotes: 3
Reputation: 30831
It's usually preferable to set a defined value for every key. The idiomatic value (when you don't care about the value) is 1.
my %hash = map { $_ => 1 } @array;
Doing it this way makes the code the uses the hash slightly simpler because you can use $hash{key}
as a Boolean value. If the value can be undefined you need to use the more verbose exists $hash{key}
instead.
That said, there are situations where a value of undef
is desirable. For example: imagine that you're parsing C header files to extract preprocessor symbols. It would be logical to store these in a hash of name => value pairs.
#define FOO 1
#define BAR
In Perl, this would map to:
my %symbols = ( FOO => 1, BAR => undef);
In C a #define
defines a symbol, not a value -- "defined" in C is mapped to "exists" in Perl.
Upvotes: 7
Reputation: 47829
undef
is a value.
Of course, stuff like that is always depndent on what you are currently doing. But $foo{bar}
is just a variable like $bar
and I don't see any reason why either one should not be undef
every now and then.
PS:
That's why exists
exists.
Upvotes: 3
Reputation: 25237
You can't create a hash key without a value. The value can be undef but it will be there. How else would you construct a hash. Or was your question regarding whether the value can be undef? In which case I would say that the value you store there (undef, 1, 0...) is entirely up to you. If a lot of folks are using it then you probably want to store some true value though incase some one else uses if ($hash{$extension}) {...} instead of exists because they weren't paying attention.
Upvotes: 5