rocket
rocket

Reputation:

Keeping top 5 values in PHP efficently

I'm writing a small algorithm in PHP that goes through n number of movies with ratings, and will store the top 5. I'm not reading from a datafile, but from a stream so I cannot simply order the movies by rating.

My question is what is the most efficent way to keep track of the top 5 rated movies as I read the stream? Currently I do the following:

  1. Read in 5 movies (into an array called movies[]), with two keys movies[][name] and movies[][rating]
  2. Order the array by movies[rating] using array_multisort() (highest rating now sits at movies[4])
  3. Read in the next movie
  4. If this new movie rating > movies[0][rating] then replace movies[0] with this new movie
  5. Re-order the list
  6. Repeat 3-5 until finished

My method works, but requires a sort on the list after every read. I believe this to be an expensive method mostly due to the fact that every time I use array_multisort() I must do a for loop on 5 movies just to build the index to sort on. Can anyone suggest a better way to approach this?

Upvotes: 1

Views: 665

Answers (8)

OIS
OIS

Reputation: 10033

Maybe this can be of help.

class TopList {
    private $items = array();
    private $indexes = array();
    private $count = 0;
    private $total = 5;
    private $lowest;
    private $sorted = false;

    public function __construct($total = null) {
        if (is_int($total))
            $this->total = $total;

        $this->lowest = -1 * (PHP_INT_MAX - 1);
    }

    public function addItem($index, $item) {
        if ($index <= $this->lowest)
            return;

        $setLowest = $this->count === $this->total;
        if ($setLowest) {
            /* //remove first added
            $lowestIndex = array_search($this->lowest, $this->indexes);
            /*/ //remove last added
            $lowestIndex = end(array_keys($this->indexes, $this->lowest));
            //*/
            unset($this->indexes[$lowestIndex], $this->items[$lowestIndex]);
        } else {
            ++$this->count;
            $setLowest = $this->count === $this->total;
        }

        $this->indexes[] = $index;
        $this->items[] = $item;
        $this->sorted = false;

        if ($setLowest)
            $this->lowest = min($this->indexes);
    }

    public function getItems() {
        if (!$this->sorted) {
            array_multisort($this->indexes, SORT_DESC, $this->items);
            $this->sorted = true;
        }
        return $this->items;
    }
}

$top5 = new TopList(5);
foreach ($movies as $movie) {
    $top5->addItem($movie['rating'], $movie);
}
var_dump($top5->getItems());

Upvotes: 0

Ilya Birman
Ilya Birman

Reputation: 10092

Here’s what I would do:

// let’s say get_next_movie () returns array with 'rating' and 'name' keys

while ($m = get_next_movie ()) {

  $ratings[$m['rating']][] = $m['movie'];

  $temp_ratings = $ratings;
  $top5 = array ();
  $rating = 5;
  while (1) {
    if (count ($temp_ratings[$rating])) {
      $top5[] = array_shift ($temp_ratings[$rating]);
    } elseif ($rating > 0) {
      --$rating;
    } else {
      break;
    }
  }

  // $top5 has current top 5 :-)

}

$ratings array looks like this, each rating has array of movies inside:

Array
    (
    [5] => Array
        (
            [0] => Five!
        )

    [3] => Array
        (
            [0] => Three
            [1] => Threeeeee
            [2] => Thr-eee-eee
        )

    [4] => Array
        (
            [0] => FOR
        )
    )

Upvotes: 0

vartec
vartec

Reputation: 134631

  1. there is no need for two keys in array. array with name as key, and rating as value will do. Sort it with arsort();
  2. the algorithm is not perfect, you can do it optimally with linked list. Although I think linked list implemented in PHP will be actually slower that function call to asort() for 6 elements. For big O estimation, you can assume that sorting 6 elements has constant time.
  3. You'll only sort when you encounter movie rated higher then the actual, so in average case you'll do it less an less often, while progressing. You'll sort on every movie only in worst case scenario of having initial list sorted from lowest rated.

Upvotes: 0

paxdiablo
paxdiablo

Reputation: 882028

No point in re-sorting after every read since you really only need to insert a new entry. Use the following algorithm, it's likely to get you the best speed. It's basically an unrolled loop, not the most beautiful code.

set movies[0..4].rating to -1.
while more movies in stream:
    read in next movie.
    if movie.rating < movies[0].rating:
        next while
    if movie.rating < movies[1].rating:
        movies[0] = movie
        next while
    if movie.rating < movies[2].rating:
        movies[0] = movies[1]
        movies[1] = movie
        next while
    if movie.rating < movies[3].rating:
        movies[0] = movies[1]
        movies[1] = movies[2]
        movies[2] = movie
        next while
    if movie.rating < movies[4].rating:
        movies[0] = movies[1]
        movies[1] = movies[2]
        movies[2] = movies[3]
        movies[3] = movie
        next while
    movies[0] = movies[1]
    movies[1] = movies[2]
    movies[2] = movies[3]
    movies[3] = movies[4]
    movies[4] = movie

At the end, you have your sorted list of movies. If there's less than 5, those others will have a rating of -1 so you'll know they're invalid. This is assuming that the rating on a real movie is zero or greater but you can adjust the values if they're not.

If you need to adjust it for more than 5 movies, you can. The best bet would be to roll up the loop again. At some point, however, it's going to become more efficient to sort it than use this method. This method's only really good for a small data set.

Upvotes: 3

mozboz
mozboz

Reputation: 1109

My method works, but requires a sort on the list after every read.

No it doesn't, it only requires a sort after you find a new movie whos rating is > movies[0][rating].

This method seems efficient to me. You only sort occasionally when there's a new entry for the top 5, which will happen less the more movies you process.

Upvotes: 1

PanMan
PanMan

Reputation: 1314

How big is the list? I'm guessing it's not an option to keep the entire list in memory, and sort it at the end?

Upvotes: 0

Phil H
Phil H

Reputation: 20151

Linked lists would work here.

Build a linked list that chains the first 5 movies in the correct order. For each new movie, just start at the the end of the chain and walk it until your movie is between one with a higher rating and one with a lower rating. Then insert your link into the list here. If the movie was better than the worst (and thus your list is now 6 long), just remove the last link in the chain, and you are back to 5.

No sorting, no indexing.

Upvotes: 4

dirkgently
dirkgently

Reputation: 111200

Your algorithm looks fine. I am not sure how the arrays are implemented in PHP. From an algorithm point of view: use a heap instead of an array.

Upvotes: 3

Related Questions