Reputation: 991

Is it better to check Perl hash keys for truth or for existence?

Is it more preferrable, when assigning to a hash of just keys (where the values aren't really needed), to say:

$hash{$new_key} = "";

Or to say:

$hash{$new_key} = 1;

One necessitates that you check for a key with exists, the other allows you to say either:

if (exists $hash{$some_key})

if ($hash{$some_key})

I would think that assigning a 1 would be better, but are there any problems with this? Does it even matter?

Upvotes: 15

Answers (8)

Axeman

Reputation: 29854

I usually check for defined values. That's the middle case that you're leaving out. Not quite "truth" not quite "exists" either. (Mostly, but not quite.)

Now in theory, the more general way is exists, as in

if ( exists $hash{$key} ) return 'strawberry';

This covers the case where the key exists and the value is 0, or when the key has been assigned undef. The key just needs to exist to pass this test.

However, I have rarely found the need to test the existence of a key.

Hashes are often part of a defined API, and if you're processing them, you have some idea of the range of values that can be stored. The configuration item will be looking for specific things; and as unordered parameter keys, subroutines will be looking for specific things.
I find the idea of an "infinite table" a very flexible concept. And exists x <=> defined x works for that. Every conceivable value is "set" in the table, but only a finite number of keys are defined, the rest are considered to be undefined.

As a result, usually though, unless a value is defined in a hash, I don't care what it is. I consider it a false value. Storing undef and not storing anything at all are equivalent in most things that I write. This is further motivated by the item below.
Most of the time that I might need to know if a key is in the table, I need to use it for something else. First I store the value locally, and then test if for a defined value.
```
 my $value = $hash{$key};
 if ( defined $value ) { 
     push @valid_values, $value;
 }
```
If I could be sure that there was some local common-subexpression optimization between the lookup for exists and the lookup to use the value, then I wouldn't be so picky about this. But I don't like to retrieve from a hash more than once. So I 1) cache the value and 2) check it--every time.

That said, I can tighten the criteria is I know that the value should not be 0, such as in a lookup or a parameter table. So I sometimes test for truth. But I also can tighten up the test for anything, anyway.

     if ( ( $hash{$key} || '' ) =~ m/^(?:Bears|Lions|Packers|Vikings)$/ ) { 
         $nfc_north++;
     }

Of course an operating principle here is that defined works for "unlimited" tables. Where every conceivable value is "set" in the table, but only a finite number of keys are defined.

There is a case that you might be working on a totally anonymous hash. But then, what's your interest in the keys that can't be satisfied with keys or values? Even if you're making a all-purpose hash "convenience function", it's better not concerning yourself with existences of particular keys in order to be totally neutral to what somebody else has stored there.

Upvotes: 0

daotoad

Reputation: 27183

* Update: * Sinan points out that my cautious approach to hash element creation is dated and not an issue on newer Perls. I've edited my post below, and added some new thoughts on the matter.

The problem with just testing for truth is that you can modify the hash with thecrufty old version of Perl that I learnt on. This code is safe with Perl 5.8:

my %foo = ();

if( $foo{bar} ) {
   print "never happens";
}

print keys %foo;

~~This is the bad part of the mixed blessing of auto-vivification (over all I like auto-viv, but this is where it hurts).~~

In many situations, this is no big deal. But it is a potential issue to be aware of. I address this in my code by locking any hash that must remain unmodified.

In practice I either wind up always doing an exists test before a boolean test as well.

if( exists $foo{bar} and $foo{bar} ) {
    # hash is not modified due to short circuit
}

The same kind of alteration of data structures can occur with arrays. If you access $foo[2000], then the array will be extended. So it can be a good idea to test for existence before you accidentally extend an array. In practice this has been much less of an issue than the corresponding hash behavior. <-- The irony here is that you can only use exists on an array on perls 5.6 and newer, where presumably this problem has been fixed.

If I need to go digging into data structures, I use Data::Diver. It automatically checks existence at each level in the structure to prevent accidental alteration of your data structure.

The most important thing is to be consistent within each script/program. The easiest way to run into problems is to test for existence here, but truth there. Especially if you are accessing the same hash for both sets of tests.

Final thoughts on my update regarding autovivification: A flurry of research showed several things. I should have tested my code before posting--by failing to do so, I spread misinformation, which I apologize for. I also discovered that there are still some sneaky issues with autovivification lingering--enough that there is an open todo item to make things right. So, while it may be wrong-headed, old-fashioned and dumb, I will continue to explicitly take steps to control autovivification and restrict it to occurring only when I want it to occur. FWIW, autovivification is a great thing when it works. I think special casing if to prevent autoviv is the right thing to do--it gets rid of the need for a lot of extra code, but I wish I could find some docs that detailed that behavior.

Upvotes: 3

Sinan Ünür

Reputation: 118128

Upvotes: 2

KingPong

Reputation: 1459

If you're trying to save memory (which generally only matters if you have a very large hash), you can use undef as the value and just test for its existence. Undef is implemented as a singleton, so thousands of undefs are all just pointers to the same value. Setting each value to the empty string or 1 would allocate a different scalar value for each element.

my %exists;
@exists{@list} = ();

In light of your later comment about your intended use, this is the idiom I've seen and used many times:

my %seen;
while (<>) {
    next if $seen{$_}++; # false the first time, true every successive time
    ...process line...
}

Upvotes: 9

P Shved

Reputation: 99254

Assume you actually needed to check existence of keys, but you wrote code that checks for truth. It checks for truth throughout your program in various places. Then it suddenly appears that you misunderstood something and you should actually store a mapping from your keys to string values; the strings should be used in the same dataflow as you've already implemented.

And the strings can be empty!

Hence you should either refactor your program or create another hash, because truth checks no longer check existence. That wouldn't happen if you checked for existence from the very beginning.

(edited coz dunno why got voted down.)

Upvotes: 3

Chris Simmons

Reputation: 1853

When the values aren't needed, you'll often see this idiom:

my %exists;
$exists{$_}++ for @list;

Which has the effect of setting it to be 1.

Upvotes: 11

Samachar Patra

Reputation: 81

As the prior answer says, it depends on what you are trying to achieve; if you are just trying to get (for instance) unique values from some set (whose elements then form the keys), you can just use exists (could also help to catch duplicates if you check for exists first before assigning a value).

Without knowing the application, it's difficult to be more specific.

Upvotes: 2

brian d foy

Reputation: 132802

It depends on whether you need the key to exist or to have a true value. Test for the thing you need. If you are using a hash merely to see if something is in a list, exists() is the way to go. If you are doing something else, checking the value might be the way to go.

Upvotes: 16

Is it better to check Perl hash keys for truth or for existence?

Answers (8)

Related Questions