Reputation: 4850
Seems like using update should be pretty straight forward, and I think that I'm using it correctly, so it must be an error dealing with types or something else.
But anyway, here's the sit:
I'm doing coursework for a Coursera course (needless to say, answers minimizes or occluding code most helpful!) and am stuck on the last problem. The task is to return a set that contains all the documents which contain all the words in a query. The function takes an inverseIndex, a dictionary containing words as keys and the documents containing those words as values ex: {'a':[0,1],'be':[0,1,4].....}
The way I've attempted to implement this is pretty simple: get a set of sets, where each of the sets contains the list of document IDs, and then call .intersections(sets) to merge the sets into a set containing only the doc IDs of docs that contain all words in the query.
def andSearch(inverseIndex, query):
sets = set()
s = set()
for word in query:
s.update(inverseIndex[word])
print(inverseIndex[word])
print s
s.intersection(*sets)
return s
Unfortunately, this returns all the documents in the inverseIndex when it should only return the index '3'.
terminal output:
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 3, 4]
[2, 3, 4]
set([0, 1, 2, 3, 4])
What's wrong?
Thanks so much!
sets = []
s = set()
for word in query:
sets.append(inverseIndex[word])
print sets
s.intersection(*sets)
return s
Output:
[[0, 1, 2, 3, 4], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3], [0, 1, 3, 4], [2, 3, 4]]
set([])
logout
Upvotes: 2
Views: 641
Reputation: 13410
You use update
inside the loop. So, on each iteration you add the new pages to s
. But you need to intersect those pages, because you need the pages, each of which contains all the words (not 'at least one word'). So you need to intersect
on each iteration instead of updating.
Also, I'm not getting why you need sets
at all.
This should work:
def andSearch(inverseIndex, query):
return set.intersection(*(set(inverseIndex[word]) for word in query))
This just produces the array of set
s:
>>> [set(ii[word]) for word in query]
[set([0, 1]), set([0, 1, 4])]
And then I just call set.intersection
to intersect them all.
About your question update.
It happens because s
is empty.
Consider this example:
>>> s = set()
>>> s.intersection([1,2,3],[2,3,4])
set([])
To intersect sets just use set.intersection
. But it accepts only sets as arguments. So you should convert lists of pages to sets of pages, or keep pages as sets in the dictionary.
Upvotes: 2