Mathijs Segers
Mathijs Segers

Reputation: 6238

Get Popular words in PHP+MySQL

How do I go about getting the most popular words from multiple content tables in PHP/MySQL.

For example, I have a table forum_post with forum post; this contains a subject and content. Besides these I have multiple other tables with different fields which could also contain content to be analysed.

I would probably myself go fetch all the content, strip (possible) html explode the string on spaces. remove quotes and comma's etc. and just count the words which are not common by saving an array whilst running through all the words.

My main question is if someone knows of a method which might be easier or faster.

I couldn't seem to find any helpful answers about this it might be the wrong search patterns.

Upvotes: 9

Views: 2770

Answers (2)

bestprogrammerintheworld
bestprogrammerintheworld

Reputation: 5520

I see you've accepted an answer, but I want to give you an alternative that might be more flexible in a sense: (Decide for yourself :-)) I've not tested the code, but I think you get the picture. $dbh is a PDO connection object. It's then up to you what you want to do with the resulting $words array.

<?php
$words = array();

$tableName = 'party'; //The name of the table
countWordsFromTable($words, $tableName)

$tableName = 'party2'; //The name of the table
countWordsFromTable($words, $tableName)

//Example output array:
/*
$words['word'][0] = 'happy'; //Happy from table party
$words['wordcount'][0] = 5;
$words['word'][1] = 'bulldog'; //Bulldog from table party2
$words['wordcount'][1] = 15;
$words['word'][2] = 'pokerface'; //Pokerface from table party2
$words['wordcount'][2] = 2;
*/

$maxValues = array_keys($words, max($words)); //Get all keys with indexes of max values     of $words-array
$popularIndex = $maxValues[0]; //Get only one value...
$mostPopularWord = $words[$popularIndex]; 


function countWordsFromTable(&$words, $tableName) {

    //Get all fields from specific table
    $q = $dbh->prepare("DESCRIBE :tableName"); 
    $q->execute(array(':tableName' = > $tableName));
    $tableFields = $q->fetchAll(PDO::FETCH_COLUMN);

    //Go through all fields and store count of words and their content in array $words
    foreach($tableFields as $dbCol) {

        $wordCountQuery = "SELECT :dbCol as word, LENGTH(:dbCol) - LENGTH(REPLACE(:dbCol, ' ', ''))+1 AS wordcount FROM :tableName"; //Get count and the content of words from every column in db
        $q = $dbh->prepare($wordCountQuery);
        $q->execute(array(':dbCol' = > $dbCol));
        $wrds = $q->fetchAll(PDO::FETCH_ASSOC);

        //Add result to array $words
        foreach($wrds as $w) {
            $words['word'][] = $w['word'];
            $words['wordcount'][] = $w['wordcount'];
        }

    }
}
?>

Upvotes: 0

AbsoluteƵER&#216;
AbsoluteƵER&#216;

Reputation: 7870

Somebody's already done it.

The magic you're looking for is a php function called str_word_count().

In my example code below, if you get a lot of extraneous words from this you'll need to write custom stripping to remove them. Additionally you'll want to strip all of the html tags from the words and other characters as well.

I use something similar to this for keyword generation (obviously that code is proprietary). In short we're taking provided text, we're checking the word frequency and if the words come up in order we're sorting them in an array based on priority. So the most frequent words will be first in the output. We're not counting words that only occur once.

<?php
$text = "your text.";

//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}

$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
    if ($count > 2){
        $list .= "$word ";
    }
}
if (empty($list)){
    $list = "Not enough duplicate words for popularity contest.";   
}
echo $list;
?>

Upvotes: 4

Related Questions