Panki0
Panki0

Reputation: 45

PHP text scanner how?

I want to make a text scanner for similar words in PHP, but I do not know where to begin. The scanner will have to scan a paragraph and point out matches that are found in a Database or propose specific words that will assist in making this paragraph better.

I thought at first that I could use a database and a search engine script but I have been told that this is not the way to do it.

Can someone please point me to the right direction so I can start working on this?

Upvotes: 2

Views: 648

Answers (3)

HBv6
HBv6

Reputation: 3537

I just post another answer because my first one was wrong after the OP comment and it had too many comments.

First you need to extrapolate every single word from your paragraph, using for example:

$words_array = explode(" ", $paragraph);

Then you need to remove special characters, slashes, points, commas etc. (maybe using str_replace()).

In the second step you need to build a Database of synonyms like this:

| id | word | synonyms |
| 0  | car  | vehicle  |
| 1  | car  | transport |

Then do something like (for each word of your paragraph):

SELECT synonyms FROM table WHERE word="car"

And after this you can fetch the results.

But this is only the start. You NEED to optimize this method. Example: you can do that when you search vehicle it returns car. The same for transport. That's up to you!

Upvotes: 1

cb0
cb0

Reputation: 8613

Searching for similarities in text can be real hard work. If you want to search the paragraph and then check if there's a similar text in the database I would advise to use the "tfidf" algorithm. I used it in my thesis and it did worked fine.

However there is no "master" algorithm that does everything you need. It's a lot of research and it always depends on the properties of the text you'll use. Some knowledge about NLP could also help solving such problems.

For finding only word similarites I would use something like this. Hope his helps.

Upvotes: 2

HBv6
HBv6

Reputation: 3537

Have you already tried with similar_text()? It's very easy to use and you can easily adaptate it to using a DB (where DB may be a text file, a SQL DB or even an Array).

Fast example:

// you have to call this function multiple times for each word of the paragraph and for each word of your DB of suggestions
function suggest ($word_of_the_paragraph, $word_taken_from_a_DB) {
    similar_text($word_of_the_paragraph, $word_taken_from_a_DB, $percent);
    if ($percent >= $threshold) {
        echo $word_taken_from_a_DB; // this is the suggested word
    }
}

Upvotes: 0

Related Questions