gom
gom

Reputation: 897

PHP - Build a sorted list of top k items

By 'list' i mean the English word, not necessary linked list. You can use any data structure. However, PHP has inbuilt support for some data structures: https://www.php.net/manual/en/spl.datastructures.php from which min heap is seeming suitable to me for my problem. Though i don't know how to use the PHP's min heap facility.

Say a loop is reading from database and outputting some user ids and with each user id a score of how similar the user's name is compared to an inputted word. After end of the loop, i want to view top 10 users in decreasing order of the score. The score calculation is done inside the loop.

Easiest way for me to do this is: While calculating the scores (inside the loop), store all user ids with their scores in an array. After all scores are stored, sort the array using PHP's inbuilt sort facility. Display the top 10 elements from the array.
But why bother (the system) in storing and sorting all the scores when i want just 10 top users. So, any good method?

Another possible solution i'm imagining is like, feel free to ignore:

Maintaining a linked list of scores in decreasing order. After it reaches length 10, then when receive a new score, check whether it's smaller than the score in right-most node (the 10th node), if it's then discard it, if not then discard the right-most node and insert the new score at its proper place by checking whether it's smaller than 5th (middle) element of the linked list, if it's then compare the same with 7th (middle of 5th and 9th) and so on.

PS: I've no problem with top k elements being sorted after they all have been selected.

Upvotes: 0

Views: 382

Answers (2)

gom
gom

Reputation: 897

$topMatches = new SplMinHeap();

/* Building the list */
while($user = mysqli_fetch_assoc($users)){
 .. calculate score of the $user against the inputted word ..
 if($topMatches->count() === $k)
  if($topMatches->top()[0] < $score) //haven't && both if's cause ->top will give error when heap empty
   $topMatches->extract();
 if($topMatches->count() !== $k)
  $topMatches->insert([$score, $user['id']]);
}

Outputting the above created min heap:
Check whether $topMatches isEmpty() or alternatively its count() is 0. If it's then return;. Next:

do{
 list($score, $userid) = $topMatches->extract();
 //echoing
}while($topMatches->valid());

Upvotes: 0

trincot
trincot

Reputation: 350715

You can use a min-heap or min-priority queue (which in PHP is slightly different). When that heap has k elements, then exchange the heap's top element when you find an entry with a better score than that least score in the heap. Then you will end up with the k top entries, with the least score at the top. So as a final step you would extract the entries from the heap and reverse their order.

Here is how that could look, using SplPriorityQueue. Note that this structure puts maximum priority values at the top, so we would provide it negative scores, so to get the minimum scores at the top of the heap/queue:

function getTop($input, $k) {
    $q = new SplPriorityQueue();
    $q->setExtractFlags(SplPriorityQueue::EXTR_PRIORITY);
    foreach ($input as $entry) {
        if ($q->count() < $k) {
            $q->insert($entry, -$entry["score"]); // negate score to get lower scores first
        } else if ($entry["score"] > -$q->top() ) { // better score than least in queue? Exchange
            $q->extract();
            $q->insert($entry, -$entry["score"]);
        }
    }
    $q->setExtractFlags(SplPriorityQueue::EXTR_DATA);
    return array_reverse(iterator_to_array($q));
}

Here is some sample input data and how you would call the above function:

$input = [
    ["user" => "a", "score" => 17],
    ["user" => "b", "score" =>  3],
    ["user" => "c", "score" => 10],
    ["user" => "d", "score" => 11],
    ["user" => "e", "score" =>  5],
    ["user" => "f", "score" => 19],
    ["user" => "g", "score" =>  7],
    ["user" => "h", "score" =>  2],
    ["user" => "i", "score" => 18],
    ["user" => "j", "score" => 12],
    ["user" => "k", "score" => 10],
    ["user" => "l", "score" =>  6],
    ["user" => "m", "score" =>  9],
    ["user" => "n", "score" => 15],
];

$top = getTop($input, 5);

print_r($top);

Upvotes: 1

Related Questions