Reputation:
I'm writing a small algorithm in PHP that goes through n number of movies with ratings, and will store the top 5. I'm not reading from a datafile, but from a stream so I cannot simply order the movies by rating.
My question is what is the most efficent way to keep track of the top 5 rated movies as I read the stream? Currently I do the following:
My method works, but requires a sort on the list after every read. I believe this to be an expensive method mostly due to the fact that every time I use array_multisort() I must do a for loop on 5 movies just to build the index to sort on. Can anyone suggest a better way to approach this?
Upvotes: 1
Views: 665
Reputation: 10033
Maybe this can be of help.
class TopList {
private $items = array();
private $indexes = array();
private $count = 0;
private $total = 5;
private $lowest;
private $sorted = false;
public function __construct($total = null) {
if (is_int($total))
$this->total = $total;
$this->lowest = -1 * (PHP_INT_MAX - 1);
}
public function addItem($index, $item) {
if ($index <= $this->lowest)
return;
$setLowest = $this->count === $this->total;
if ($setLowest) {
/* //remove first added
$lowestIndex = array_search($this->lowest, $this->indexes);
/*/ //remove last added
$lowestIndex = end(array_keys($this->indexes, $this->lowest));
//*/
unset($this->indexes[$lowestIndex], $this->items[$lowestIndex]);
} else {
++$this->count;
$setLowest = $this->count === $this->total;
}
$this->indexes[] = $index;
$this->items[] = $item;
$this->sorted = false;
if ($setLowest)
$this->lowest = min($this->indexes);
}
public function getItems() {
if (!$this->sorted) {
array_multisort($this->indexes, SORT_DESC, $this->items);
$this->sorted = true;
}
return $this->items;
}
}
$top5 = new TopList(5);
foreach ($movies as $movie) {
$top5->addItem($movie['rating'], $movie);
}
var_dump($top5->getItems());
Upvotes: 0
Reputation: 10092
Here’s what I would do:
// let’s say get_next_movie () returns array with 'rating' and 'name' keys
while ($m = get_next_movie ()) {
$ratings[$m['rating']][] = $m['movie'];
$temp_ratings = $ratings;
$top5 = array ();
$rating = 5;
while (1) {
if (count ($temp_ratings[$rating])) {
$top5[] = array_shift ($temp_ratings[$rating]);
} elseif ($rating > 0) {
--$rating;
} else {
break;
}
}
// $top5 has current top 5 :-)
}
$ratings array looks like this, each rating has array of movies inside:
Array
(
[5] => Array
(
[0] => Five!
)
[3] => Array
(
[0] => Three
[1] => Threeeeee
[2] => Thr-eee-eee
)
[4] => Array
(
[0] => FOR
)
)
Upvotes: 0
Reputation: 134631
Upvotes: 0
Reputation: 882028
No point in re-sorting after every read since you really only need to insert a new entry. Use the following algorithm, it's likely to get you the best speed. It's basically an unrolled loop, not the most beautiful code.
set movies[0..4].rating to -1.
while more movies in stream:
read in next movie.
if movie.rating < movies[0].rating:
next while
if movie.rating < movies[1].rating:
movies[0] = movie
next while
if movie.rating < movies[2].rating:
movies[0] = movies[1]
movies[1] = movie
next while
if movie.rating < movies[3].rating:
movies[0] = movies[1]
movies[1] = movies[2]
movies[2] = movie
next while
if movie.rating < movies[4].rating:
movies[0] = movies[1]
movies[1] = movies[2]
movies[2] = movies[3]
movies[3] = movie
next while
movies[0] = movies[1]
movies[1] = movies[2]
movies[2] = movies[3]
movies[3] = movies[4]
movies[4] = movie
At the end, you have your sorted list of movies. If there's less than 5, those others will have a rating of -1 so you'll know they're invalid. This is assuming that the rating on a real movie is zero or greater but you can adjust the values if they're not.
If you need to adjust it for more than 5 movies, you can. The best bet would be to roll up the loop again. At some point, however, it's going to become more efficient to sort it than use this method. This method's only really good for a small data set.
Upvotes: 3
Reputation: 1109
My method works, but requires a sort on the list after every read.
No it doesn't, it only requires a sort after you find a new movie whos rating is > movies[0][rating].
This method seems efficient to me. You only sort occasionally when there's a new entry for the top 5, which will happen less the more movies you process.
Upvotes: 1
Reputation: 1314
How big is the list? I'm guessing it's not an option to keep the entire list in memory, and sort it at the end?
Upvotes: 0
Reputation: 20151
Linked lists would work here.
Build a linked list that chains the first 5 movies in the correct order. For each new movie, just start at the the end of the chain and walk it until your movie is between one with a higher rating and one with a lower rating. Then insert your link into the list here. If the movie was better than the worst (and thus your list is now 6 long), just remove the last link in the chain, and you are back to 5.
No sorting, no indexing.
Upvotes: 4
Reputation: 111200
Your algorithm looks fine. I am not sure how the arrays are implemented in PHP. From an algorithm point of view: use a heap instead of an array.
Upvotes: 3