Reputation: 474
i am working on an algorithm for sorting teams based on highest number of score. Teams are to be generated from a list of players. The conditions for creating a team is
What i did to get this result is generate all possibilities of team then run checks on them to exclude those teams that have more than 50K salary and then sort the remainder based on projection. But generating all the possibilities takes a lot of time and sometimes it consume all the memory. For a list of 160 players it takes around 90 seconds. Here is the code
$base_array = array();
$query1 = mysqli_query($conn, "SELECT * FROM temp_players ORDER BY projection DESC");
while($row1 = mysqli_fetch_array($query1))
{
$player = array();
$mma_id = $row1['mma_player_id'];
$salary = $row1['salary'];
$projection = $row1['projection'];
$wclass = $row1['wclass'];
array_push($player, $mma_id);
array_push($player, $salary);
array_push($player, $projection);
array_push($player, $wclass);
array_push($base_array, $player);
}
$result_base_array = array();
$totalsalary = 0;
for($i=0; $i<count($base_array)-5; $i++)
{
for($j=$i+1; $j<count($base_array)-4; $j++)
{
for($k=$j+1; $k<count($base_array)-3; $k++)
{
for($l=$k+1; $l<count($base_array)-2; $l++)
{
for($m=$l+1; $m<count($base_array)-1; $m++)
{
for($n=$m+1; $n<count($base_array)-0; $n++)
{
$totalsalary = $base_array[$i][1]+$base_array[$j][1]+$base_array[$k][1]+$base_array[$l][1]+$base_array[$m][1]+$base_array[$n][1];
$totalprojection = $base_array[$i][2]+$base_array[$j][2]+$base_array[$k][2]+$base_array[$l][2]+$base_array[$m][2]+$base_array[$n][2];
if($totalsalary <= 50000)
{
array_push($result_base_array,
array($base_array[$i], $base_array[$j], $base_array[$k], $base_array[$l], $base_array[$m], $base_array[$n],
$totalprojection, $totalsalary)
);
}
}
}
}
}
}
}
usort($result_base_array, "cmp");
And the cmp function
function cmp($a, $b) {
if ($a[6] == $b[6]) {
return 0;
}
return ($a[6] < $b[6]) ? 1 : -1;
}
Is there anyway to reduce the time it takes to do this task, or any other workaround for getting the desired number of teams
Regards
Upvotes: 3
Views: 690
Reputation: 1441
Because number of elements in array can be very big (for example 100 players can generate 1.2*10^9 teams), you can't hold it in memory. Try to save resulting array to file by parts (truncate array after each save). Then use external file sorting.
It will be slow, but at least it will not fall because of memory.
If you need top n teams (like 10 teams with highest projection) then you should convert code that generates result_base_array
to Generator, so it will yield
next team instead of pushing it into array. Then iterate over this generator. On each iteration add new item to sorted resulted array and cut redundant elements.
Upvotes: 1
Reputation: 351288
Depending on whether the salaries are often the cause of exclusion, you could perform tests on this in the other loops as well. If after 4 player selections their summed salaries are already above 50K, there is no use to select the remaining 2 players. This could save you some iterations.
This can be further improved by remembering the lowest 6 salaries in the pack, and then check if after selecting 4 members you would still stay under 50K if you would add the 2 lowest existing salaries. If this is not possible, then again it is of no use to try to add the two remaining players. Of course, this can be done at each stage of the selection (after selecting 1 player, 2 players, ...)
Another related improvement comes into play when you sort your data by ascending salary. If after selecting the 4th player, the above logic brings you to conclude you cannot stay under 50K by adding 2 more players, then there is no use to replace the 4th player with the next one in the data series either: that player would have a greater salary, so it would also yield to a total above 50K. So that means you can backtrack immediately and work on the 3rd player selection.
As others pointed out, the number of potential solutions is enormous. For 160 teams and a team size of 6 members, the number of combinations is:
160 . 159 . 158 . 157 . 156 . 155
--------------------------------- = 21 193 254 160
6 . 5 . 4 . 3 . 2
21 billion entries is a stretch for memory, and probably not useful to you either: will you really be interested in the team at the 4 432 456 911th place?
You'll probably be interested in something like the top-10 of those teams (in terms of projection). This you can achieve by keeping a list of 10 best teams, and then, when you get a new team with an acceptable salary, you add it to that list, keeping it sorted (via a binary search), and ejecting the entry with the lowest projection from that top-10.
Here is the code you could use:
$base_array = array();
// Order by salary, ascending, and only select what you need
$query1 = mysqli_query($conn, "
SELECT mma_player_id, salary, projection, wclass
FROM temp_players
ORDER BY salary ASC");
// Specify with option argument that you only need the associative keys:
while($row1 = mysqli_fetch_array($query1, MYSQLI_ASSOC)) {
// Keep the named keys, it makes interpreting the data easier:
$base_array[] = $row1;
}
function combinations($base_array, $salary_limit, $team_size) {
// Get lowest salaries, so we know the least value that still needs to
// be added when composing a team. This will allow an early exit when
// the cumulative salary is already too great to stay under the limit.
$remaining_salary = [];
foreach ($base_array as $i => $row) {
if ($i == $team_size) break;
array_unshift($remaining_salary, $salary_limit);
$salary_limit -= $row['salary'];
}
$result = [];
$stack = [0];
$sum_salary = [0];
$sum_projection = [0];
$index = 0;
while (true) {
$player = $base_array[$stack[$index]];
if ($sum_salary[$index] + $player['salary'] <= $remaining_salary[$index]) {
$result[$index] = $player;
if ($index == $team_size - 1) {
// Use yield so we don't need to build an enormous result array:
yield [
"total_salary" => $sum_salary[$index] + $player['salary'],
"total_projection" => $sum_projection[$index] + $player['projection'],
"members" => $result
];
} else {
$index++;
$sum_salary[$index] = $sum_salary[$index-1] + $player['salary'];
$sum_projection[$index] = $sum_projection[$index-1] + $player['projection'];
$stack[$index] = $stack[$index-1];
}
} else {
$index--;
}
while (true) {
if ($index < 0) {
return; // all done
}
$stack[$index]++;
if ($stack[$index] <= count($base_array) - $team_size + $index) break;
$index--;
}
}
}
// Helper function to quickly find where to insert a value in an ordered list
function binary_search($needle, $haystack) {
$high = count($haystack)-1;
$low = 0;
while ($high >= $low) {
$mid = (int)floor(($high + $low) / 2);
$val = $haystack[$mid];
if ($needle < $val) {
$high = $mid - 1;
} elseif ($needle > $val) {
$low = $mid + 1;
} else {
return $mid;
}
}
return $low;
}
$top_team_count = 10; // set this to the desired size of the output
$top_teams = []; // this will be the output
$top_projections = [];
foreach(combinations($base_array, 50000, 6) as $team) {
$j = binary_search($team['total_projection'], $top_projections);
array_splice($top_teams, $j, 0, [$team]);
array_splice($top_projections, $j, 0, [$team['total_projection']]);
if (count($top_teams) > $top_team_count) {
// forget about lowest projection, to keep memory usage low
array_shift($top_teams);
array_shift($top_projections);
}
}
$top_teams = array_reverse($top_teams); // Put highest projection first
print_r($top_teams);
Have a look at the demo on eval.in, which just generates 12 players with random salary and projection data.
Even with the above mentioned optimisations, doing this for 160 teams might still require a lot of iterations. The more often the salaries amount to more than 50K, the better the performance will be. If this never happens, the algorithm cannot escape from having to look at each of the 21 billion combinations. If you would know beforehand that the 50K limit would not play any role, you would of course order the data by descending projection, like you originally did.
Another optimisation could be if you would feed back into the combination function the 10th highest team projection you have so far. The function could then eliminate combinations that would lead to a lower total projection. You could first take the 6 highest player projection values and use this to determine how high a partial team projection can still grow by adding the missing players. It might turn out that this becomes impossible after having selected a few players, and then you can skip some iterations, much like done on the basis of salaries.
Upvotes: 0