user3333975
user3333975

Reputation: 125

Suckerupper With Hash Enumeration

I've got some code that a friend of mine helped create:

 1  use LWP::Simple;
 2  use HTML::TreeBuilder;
 3  use Data::Dumper;
 4   
 5  my $tree = url_to_tree( 'http://www.registrar.ucla.edu/schedule/schedulehome.aspx' );
 6   
 7  my @selects  = $tree->look_down( _tag => 'select' );
 8  my @quarters = map { $_->attr( 'value' ) } $selects[0]->look_down( _tag => 'option' );
 9  my @courses  = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $selects[1]->look_down( _tag => 'option' );
10   
11  my $n = 0;
12   
13  my %hash;
14   
15  for my $quarter ( @quarters )
16  {
17      for my $course ( @courses )
18      {
19          my $tree_b = url_to_tree( "http://www.registrar.ucla.edu/schedule/crsredir.aspx?termsel=$quarter&subareasel=$course" );
20         
21          my @options = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $tree_b->look_down( _tag => 'option' );
22         
23          for my $option ( @options )
24          {
25           
26           
27              print "trying: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option\n";
28             
29              my $content = get( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
30             
31              next if $content =~ m/No classes are scheduled for this subject area this quarter/;
32             
33              $hash{"$course-$option"} = 1;
34              #my $tree_c = url_to_tree( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
35             
36              #my $table = ($tree_c->look_down( _tag => 'table' ))[2]->as_HTML;
37             
38              #print "$table\n\n\n\n\n\n\n\n\n\n";
39             
40              $n++;
41          }
42      }
43  }
44   
45  my $hash_count = keys %hash;
46  print "$n, $hash_count\n";
47   
48  sub url_to_tree
49  {
50      my $url = shift;
51     
52      my $content = get( $url );
53   
54      my $tree = HTML::TreeBuilder->new_from_content( $content );
55     
56      return $tree;
57  }

I'm having trouble understanding what lines 33 and 45 are doing. I think for the most part I get what everything else is doing, namely that @selects puts all the things contained in the two select tags in the master .aspx file on the website under consideration--I think the size of @selects is 2. I also get that from this point the 0-th slot of @selects is passed into @quarters, and similarly the position-1 slot is passed into @courses. Every unique match is enumerated and so n is the total number of courses offered throughout the year. Now, what I don't get is what $hash_count is enumerating. I suspect it is the number of unique courses offered, so where as n is an animal something akin to (in pseudocode)

sizeof( ['math1 FALL 2014' , 'math1 SPRING 2014'] ) = 2

I suspect hash_count is an animal like

sizeof( ['math1 FALL 2014' , 'math1 SPRING 2014'] ) = 1

Right?

Upvotes: 0

Views: 63

Answers (2)

Neil Lunn
Neil Lunn

Reputation: 151220

The purpose of a Hash in this instance is to make sure that duplicates are being removed from the two arrays you are processing.

It's a basic principle, the "Hash" is being built up with your "course" and "option" elements. When something new is there it creates a new entry. When something already exists the value is just updated, as here:

$hash{"$course-$option"} = 1;

At the end the keys statement gets all the keys of the hash created. In this (scalar) context it just returns the number of keys, hence the count.

my $hash_count = keys %hash;

Basically the code is removing duplicates.

Some reading on hashes may be suggested.

But here are the basics:

Say we already have hash defined like this:

my %hash = ( one => 1, two => 2, three => 3 );

We can assign a new value to the hash like this:

$hash["four"] = 4;

And the new contents will be:

( one => 1, two => 2, three => 3, four => 4 )

But if use a "key" that already "exists" like this

$hash["two"] = 5;

The resulting contents will be this

( one => 1, two => 5, three => 3, four => 4 )

So we don't add an additional entry, the existing key simply has it's value updated. There is only one entry for "two" and there are no duplicate values of "two".

We can, as in the final part of the code get the keys of the hash as in the following:

my @keys = keys %hash;

And this will return a list that looks like this:

( 'one', 'two', 'three', 'four' )

They won't be in that order, but just not to complicate. But if we are not returning to something that will accept a list, as here:

my $count = keys %hash;

Then what is returned is the number of items contained within the hash:

print "$count\n";

Will output 4 as the result.

The code collects the unique occurrences of the combined "course" and "option" values, makes sure they are unique by storing that as a key in the hash. Then finally it returns the count of the keys to your variable $hash_count. Then prints the result.

Upvotes: 3

Daniel Standage
Daniel Standage

Reputation: 8314

  • Line 33 stores $course-$option as a key in the hash, with 1 as its associated value. Why? Hashes provide a convenient and quick mechanism for lookups. Those values could instead have been stored in an array, but subsequent lookups (to test whether a given key has been seen before) would not be nearly as quick.
  • Line 45 is a syntactically dense statement, but it is essentially storing the number of keys in the hash. The keys function returns an array containing--you guessed it--all of the keys in the hash. However, since the variable to which it is being assigned ($hash_count) is a scalar, the array is being evaluated in scalar context. An array evaluated in scalar context is simply the number of entries in that array.

Upvotes: 2

Related Questions