Reputation: 1545
I'm working through "Programming collective intelligence". In chapter 4, Toby Segaran builds an artificial neural network. The following function appears on page of the book:
def generatehiddennode(self,wordids,urls):
if len(wordids)>3: return None
# Check if we already created a node for this set of words
sorted_words=[str(id) for id in wordids]
sorted_words.sort()
createkey='_'.join(sorted_words)
res=self.con.execute(
"select rowid from hiddennode where create_key='%s'" % createkey).fetchone()
# If not, create it
if res==None:
cur=self.con.execute(
"insert into hiddennode (create_key) values ('%s')" % createkey)
hiddenid=cur.lastrowid
# Put in some default weights
for wordid in wordids:
self.setstrength(wordid,hiddenid,0,1.0/len(wordids))
for urlid in urls:
self.setstrength(hiddenid,urlid,1,0.1)
self.con.commit()
What I can't possibly understand is the reason of the first line in this function: 'if len(wordids>3): return None`. Is it a debug code that needs to be removed later?
P.S. this is not a homework
Upvotes: 4
Views: 426
Reputation: 65854
For a published book, that's pretty terrible code! (You can download all the examples for the book from here; the relevant file is chapter4/nn.py
.)
wordids
and urls
play?wordids
probably come from a user query and so may be untrusted—but then, maybe they are ids rather than words so it's OK in practice but still a very bad habit to get into).SELECT EXISTS(...)
rather than asking the database to send you a bunch of records which you're then going to ignore.createkey
. No error. Is that correct? Who can say?0.1
(perhaps there are always 10 URLs, but it would be better style to scale by len(urls)
here).I could go on and on, but I better not.
Anyway, to answer your question, it looks as though this function is adding a database entry for a node in the hidden layer of a neural network. This neural network has, I think, words in the input layer, and URLs in the output layer. The idea of the application is to attempt to train a neural network to find good search results (URLs) based on the words in the query. See the function trainquery
, which takes the arguments (wordids, urlids, selectedurl)
. Presumably (since there's no docstring I have to guess) wordids
were the words the user searched for, urlids
are the URLs the search engine offered the user, and selectedurl
is the one the user picked. The idea being to train the neural net to better predict which URLs users will pick, and so place those URLs higher in future search results.
So the mysterious line of code is preventing nodes being created in the hidden layer with links to more than three nodes in the input layer. In the context of the search application this makes sense: there's no point in training up the network on queries that are too specialized, because these queries won't recur often enough for the training to be worth it.
Upvotes: 6
Reputation: 40029
You probably should have posted a little more context for code. Here is the paragraph in Programming Collective Intelligence which immediately precedes that code:
This function will create a new node in the hidden layer every time it is passed a combination of words that it has never seen together before. The function then creates default-weighted links between the words and the hidden node, and between the query node and the URL results returned by this query.
I realize it still doesn't help answer your question, but it would have helped Gareth Rees out with his answer by giving less guesswork. Gareth still got it correct, anyway, since he's clever. The intention is to restrict the number of word nodes a hidden node can be associated with, and the author chose the arbitrary number of 3.
Just to agree with Gareth, again, that paragraph should have totally been in the docstring, and the purpose of the line in question should have been in a comment above the line. I hope the next edition isn't so sloppy.
Upvotes: 1
Reputation: 16195
To elaborate on the above comments look at this simple script...
def doSomething(wordids):
if len(wordids)>3: return None
print("The rest of the function executes")
blah = [2,3,4];
doSomething(blah)
blah = [2,3,4,5];
doSomething(blah)
. . so if the length of wordids is longer than 3 then the function does nothing. It is common to check the inputs to functions but errors are normally handled using exceptions in more advanced cases.
Upvotes: 0