Reputation: 185
I have an array like this:
['ball', 'football', 'volleyball', 'football player', 'football league', 'tennis']
I want to sort it like the following based on "football" keyword:
['football', 'football player', 'football league', 'ball', 'volleyball', 'tennis']
How can I achieve this?
Upvotes: 2
Views: 4543
Reputation: 34914
I tried to understand your question, and tried to solve like this
<?php
$abc =["ball","football","volleyball","football player", "football league", "tennis"];
$word ="football";
$final = array();
// collect complete match
foreach($abc as $key=>$value){
if($value==$word){
$final[] = $value;
unset($abc[$key]);
}
}
//collect if word found in another string
foreach($abc as $key=>$value){
if(strpos($value,$word)!==false){
$final[] = $value;
unset($abc[$key]);
}
}
// collect if another string have some part of word
foreach($abc as $key=>$value){
if(strpos($word,$value)!==false){
$final[] = $value;
unset($abc[$key]);
}
}
// collect rest of the elements
$final = array_merge($final,$abc);
print_r($final);
?>
Ouput is
Array
(
[0] => football
[1] => football player
[2] => football league
[3] => ball
[4] => volleyball
[5] => tennis
)
Check here: https://eval.in/593747
Upvotes: 1
Reputation: 12376
This is a fun little problem where you need to assign some sort of score to each element of your haystack (the array of words). I think the best way to score each element is based on a classic dynamic programming problem "longest common substring". The longer this substring, the higher the sort score.
//find the longest commson substring between 2 strings, return all substrings of that length
function longestCommonSubstring($string1, $string2) {
$helper = array();
//create two dimensional array, to keep track
for($i =0; $i < strlen($string1); $i++) {
$helper[$i] = array();
for($j=0; $j< strlen($string2); $j++) {
//intialize all values to 0
$helper[$i][] = 0;
}
}
$max= 0;
$ans = array();
for($i =0; $i <strlen($string1); $i++) {
for($j =0; $j < strlen($string2); $j++) {
if ($string1[$i] == $string2[$j]) {
if($i==0 || $j==0) {
$helper[$i][$j] = 1;
} else {
$helper[$i][$j] = $helper[$i-1][$j-1] + 1;
}
if ($helper[$i][$j] > $max) {
$max = $helper[$i][$j];
$ans = array(substr($string1, $i-$max+1, $max));
} elseif($helper[$i][$j] == $max) {
$ans[] = substr($string1, $i-$max+1, $max);
}
} else {
$helper[$i][$j] = 0;
}
}
}
return $ans;
}
Now that the function is written we need to use it.
foreach($words as $word) {
$lcs = longestCommonSubstring($keyword, $word);
}
Well, that is all and good, but just using the function is only half the battle, now we need to apply some logic to results. Let's save the results in an array and give each word a score. A good score would be the length of longest substring. football
would be a better match than ball
because it has a longer string in common. But what about football
and football player
, they both have the same length of longest common substring? To solve this problem, we could use length as a percentage of the total word length. Combining these two idea, longest substring length and percentage, we get the code below.
//an associative array to save the scores
// $wordsMeta[$word] = array(lengthOfCommonSubstring, percentageOfWordMatched)
$wordsMeta = array();
//go through each word and assign a score
foreach($words as $word) {
$lcs = longestCommonSubstring($keyword, $word);
if (count($lcs) ==0 ) {
$wordPercentage = 0;
$wordLength = 0;
} else {
$wordLength = strlen($lcs[0]);
$wordPercentage = $wordLength/strlen($word);
}
$wordsMeta[$word] = array(
"percentageOfWordMatched" => $wordPercentage,
"lengthOfCommonSubstring" => $wordLength
);
}
Now we just need a sorting function that looks at length first, if they are equal, it will look at percentage and return an appropriate integer.
//our special sorting function
//checks length, if that is equal, then it checks percentage of word matched
//if both are eqaul, then those two elements are considered equal
$sort = function($a, $b) {
$ans = $a["lengthOfCommonSubstring"] - $b["lengthOfCommonSubstring"];
if ($ans == 0) {
$ans = $a["percentageOfWordMatched"] - $b["percentageOfWordMatched"];
}
if ($ans < 0) {
$ans = -1;
} elseif ($ans > 0){
$ans = 1;
} else {
$ans = 0;
}
//higher number = lower sort order
$ans *= -1;
return $ans;
};
Now the easy part: uasort($wordsMeta)
and $answer= array_keys($wordsMeta)
Beware of Demons - This algorithm is slow. Very slow. lcs
is O(n*m)
, and we are calling it count($words)
times. Making the scoring process O(n*m*x)
where:
n
is strlen($keyword)
m
is strlen($word)
x
is count($words)
Additionally we are sorting, which is O(n * log(n))
. So in total this algorithm is O(n*m*x + n*log(n))
which is not good. Keeping the wordlist short, the words in the word list short, and the keyword short will keep speed down.
Upvotes: 0
Reputation: 41810
If you want to sort based on whether the keyword is found anywhere in the word rather than just at the beginning, you can use strpos
in the comparison function.
$keyword = 'football';
usort($things, function($a, $b) use ($keyword) {
$x = strpos($a, $keyword) === false;
$y = strpos($b, $keyword) === false;
if ($x && !$y) return 1;
if ($y && !$x) return -1;
// use this if you want to sort alphabetically after the keyword sort:
return strcmp($a, $b);
// or if you only want to sort by whether or not the keyword was found:
return 0;
});
If you have a more general goal of sorting the array based on "closeness" of its terms to the keyword, the comparison must become more complex, and the way it should be done really depends on what aspects of "closeness" are most important to you. Here is an example of a more complex sort, probably not exactly what you want, but just to show what I mean about the possible complexity of determining "closeness":
$keyword = 'football';
usort($things, function($a, $b) use ($keyword) {
// prioritize exact matches first
if ($a == $keyword) return -1;
if ($b == $keyword) return 1;
// prioritize terms containing the keyword next
$x = strpos($a, $keyword);
$y = strpos($b, $keyword);
if ($x !== false && $y === false) return -1;
if ($y !== false && $x === false) return 1;
if ($x !== false && $y !== false) { // both terms contain the keyword, so...
if ($x != $y) { // prioritize matches closer to the beginning of the term
return $x > $y ? 1 : -1;
}
// both terms contain the keyword at the same position, so...
$al = strlen($a);
$bl = strlen($b);
if ($al != $bl) { // prioritize terms with fewer characters other than the keyword
return $al > $bl ? 1 : -1;
}
// both terms contain the same number of additional characters
return 0;
// or sort alphabetically with strcmp($a, $b);
// or do additional checks...
}
// neither terms contain the keyword
// check the character similarity...
$ac = levenshtein($keyword, $a);
$bc = levenshtein($keyword, $b);
if ($ac != $bc) {
return $ac > $bc ? 1 : -1;
}
return 0;
// or sort alphabetically with strcmp($a, $b);
// or do additional checks, similar_text, etc.
});
Upvotes: 2
Reputation: 825
You need to make a custom sort function, and then use it with usort.
$array=["ball","football","volleyball","football player","football league","tennis"];
function footsort($a,$b) {
$afoot=substr($a,0,8)=="football";
$bfoot=substr($b,0,8)=="football";
if ($afoot==$bfoot) return strcmp($a,$b);
/*else*/
if ($afoot) return -1;
if ($bfoot) return 1;
}
usort($array,"footsort");
print_r($array);
Response:
Array
(
[0] => football
[1] => football league
[2] => football player
[3] => ball
[4] => tennis
[5] => volleyball
)
Upvotes: 4