Reputation: 125
I've got some code that a friend of mine helped create:
1 use LWP::Simple;
2 use HTML::TreeBuilder;
3 use Data::Dumper;
4
5 my $tree = url_to_tree( 'http://www.registrar.ucla.edu/schedule/schedulehome.aspx' );
6
7 my @selects = $tree->look_down( _tag => 'select' );
8 my @quarters = map { $_->attr( 'value' ) } $selects[0]->look_down( _tag => 'option' );
9 my @courses = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $selects[1]->look_down( _tag => 'option' );
10
11 my $n = 0;
12
13 my %hash;
14
15 for my $quarter ( @quarters )
16 {
17 for my $course ( @courses )
18 {
19 my $tree_b = url_to_tree( "http://www.registrar.ucla.edu/schedule/crsredir.aspx?termsel=$quarter&subareasel=$course" );
20
21 my @options = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $tree_b->look_down( _tag => 'option' );
22
23 for my $option ( @options )
24 {
25
26
27 print "trying: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option\n";
28
29 my $content = get( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
30
31 next if $content =~ m/No classes are scheduled for this subject area this quarter/;
32
33 $hash{"$course-$option"} = 1;
34 #my $tree_c = url_to_tree( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
35
36 #my $table = ($tree_c->look_down( _tag => 'table' ))[2]->as_HTML;
37
38 #print "$table\n\n\n\n\n\n\n\n\n\n";
39
40 $n++;
41 }
42 }
43 }
44
45 my $hash_count = keys %hash;
46 print "$n, $hash_count\n";
47
48 sub url_to_tree
49 {
50 my $url = shift;
51
52 my $content = get( $url );
53
54 my $tree = HTML::TreeBuilder->new_from_content( $content );
55
56 return $tree;
57 }
I'm having trouble understanding what lines 33
and 45
are doing. I think for the most part I get what everything else is doing, namely that @selects
puts all the things contained in the two select tags in the master .aspx file on the website under consideration--I think the size of @selects
is 2. I also get that from this point the 0-th slot of @selects
is passed into @quarters
, and similarly the position-1 slot is passed into @courses. Every unique match is enumerated and so n
is the total number of courses offered throughout the year. Now, what I don't get is what $hash_count is enumerating. I suspect it is the number of unique courses offered, so where as n
is an animal something akin to (in pseudocode)
sizeof( ['math1 FALL 2014' , 'math1 SPRING 2014'] ) = 2
I suspect hash_count
is an animal like
sizeof( ['math1 FALL 2014' , 'math1 SPRING 2014'] ) = 1
Right?
Upvotes: 0
Views: 63
Reputation: 151220
The purpose of a Hash in this instance is to make sure that duplicates are being removed from the two arrays you are processing.
It's a basic principle, the "Hash" is being built up with your "course" and "option" elements. When something new is there it creates a new entry. When something already exists the value is just updated, as here:
$hash{"$course-$option"} = 1;
At the end the keys
statement gets all the keys of the hash created. In this (scalar) context it just returns the number of keys, hence the count.
my $hash_count = keys %hash;
Basically the code is removing duplicates.
Some reading on hashes may be suggested.
But here are the basics:
Say we already have hash defined like this:
my %hash = ( one => 1, two => 2, three => 3 );
We can assign a new value to the hash like this:
$hash["four"] = 4;
And the new contents will be:
( one => 1, two => 2, three => 3, four => 4 )
But if use a "key" that already "exists" like this
$hash["two"] = 5;
The resulting contents will be this
( one => 1, two => 5, three => 3, four => 4 )
So we don't add an additional entry, the existing key simply has it's value updated. There is only one entry for "two" and there are no duplicate values of "two".
We can, as in the final part of the code get the keys of the hash as in the following:
my @keys = keys %hash;
And this will return a list that looks like this:
( 'one', 'two', 'three', 'four' )
They won't be in that order, but just not to complicate. But if we are not returning to something that will accept a list, as here:
my $count = keys %hash;
Then what is returned is the number of items contained within the hash:
print "$count\n";
Will output 4
as the result.
The code collects the unique occurrences of the combined "course" and "option" values, makes sure they are unique by storing that as a key in the hash. Then finally it returns the count of the keys to your variable $hash_count
. Then prints the result.
Upvotes: 3
Reputation: 8314
$course-$option
as a key in the hash, with 1 as its associated value. Why? Hashes provide a convenient and quick mechanism for lookups. Those values could instead have been stored in an array, but subsequent lookups (to test whether a given key has been seen before) would not be nearly as quick.keys
function returns an array containing--you guessed it--all of the keys in the hash. However, since the variable to which it is being assigned ($hash_count
) is a scalar, the array is being evaluated in scalar context. An array evaluated in scalar context is simply the number of entries in that array.Upvotes: 2