Trying to extract keywords from a website PHP (OOP)

Question

haha, I still have the problem of keywords, but this is a code that I'm creating.

Is a poor code but is my creation:

loadHTMLFile($url);
    $webhtml = $doc->getElementsByTagName('p');
    $webhtml = $webhtml ->item(0)->nodeValue;

    $webhtml = strip_tags($webhtml);
    $webhtml = explode(" ", $webhtml);

    foreach($listanegra as $key=> $ln) {
    $webhtml = str_replace($ln, " ", $webhtml);
    }
    $palabras = str_word_count ("$webhtml", 1 ); 
    $frq = array_count_values ($palabras); 
    $frq = asort($frq);
    $ffrq = count($frq);
$i=1;
while ($i < $ffrq) {
    print $frqq[$i];
    print '
';
    $i++;
}
}
?>

The code trying extract keywords of a website. Extracts the first paragraph of a web, and deletes the words of the variable "$listanegra". Next, counts the repeat words and saves all words in a "array". After i call the array, and this show me the words.

The problem is... the code it's not functional =(.

When i use the code, this shows blank.

Could help me finish my code?. Was recommending me to using "tf-idf", but I will use it later.

kittycat · Accepted Answer

I do believe this is what you were trying to do:

$url = 'http://es.wikipedia.org/wiki/Animalia';

$words = Keys($url);

/// do your database stuff with $words


function Keys($url)
{
    $listanegra = array('a', 'ante', 'bajo', 'con', 'contra', 'de', 'desde', 'mediante', 'durante', 'hasta', 'hacia', 'para', 'por', 'que', 'qué', 'cuán', 'cuan', 'los', 'las', 'una', 'unos', 'unas', 'donde', 'dónde', 'como', 'cómo', 'cuando', 'porque', 'por', 'para', 'según', 'sin', 'tras', 'con', 'mas', 'más', 'pero', 'del');

    $doc = new DOMDocument();
    libxml_use_internal_errors(true);
    $doc->loadHTMLFile($url);
    $webhtml = $doc->getElementsByTagName('p');
    $webhtml = $webhtml->item(0)->nodeValue;
    $webhtml = strip_tags($webhtml);
    $webhtml = explode(' ', $webhtml);

    $palabras = array();
    foreach($webhtml as $word)
    {
        $word = strtolower(trim($word, ' .,!?()')); // remove trailing special chars and spaces
        if (!in_array($word, $listanegra))
        {
            $palabras[] = $word;
        }
    }
    $frq = array_count_values($palabras);
    asort($frq);
    return implode(' ', array_keys($frq));
}

Trying to extract keywords from a website PHP (OOP)

Answers (2)

Related Questions