finding out common words in various documents python

Question

I have a csv file which is something like this:

 book_store_id  book_price   name

 1              200.0        foo
 1              320.0        bar
 2              220.0        foobar
 2              280.0        foo

So.. this is a huge list..

Now I am trying to find out that if the input is "foo"

then since bookstore id 1 has "foo" marked lowest price, then I want to return "book_store" 1

Now How I am doing this is creating an inverted index which returns something like

foo:1,2 
bar:1
foobar:2

and so on, and then a seperate dictionary where I am maintaining the price of book in a store.. but is this the best way out to solve this out.

What is the most efficient way to solve this problem in python?

Rusty Rob · Accepted Answer

I would create a third data structure (python dict, or database table or whatever).

the data structures key should be the name (assuming name is unique).

The value this "name" key points at should be the minimum price.

Every time you insert a new book, or update the price of a book, look up the books minimum price in the third data structure, if it is less than the minimum price, set the new minimum price.

Don't forget, if you delete a book, or increase its price, make sure you update the minimum. (you could add another column so each book has a boolean "is current minimum". Then on price increase you only need to update the minimum if this is true.

The problem is, then you need to find the next best minimum when you remove the old minimum.

This is where it's good to have a heapq

python has a nice implementation of heapq here: http://docs.python.org/library/heapq.html

Otherwise you have to loop through all values to find the new minimum upon every price increase, or you have to store the 5 best prices each time, say.

have fun :)

finding out common words in various documents python

Answers (2)

Related Questions