Steve Farmer
Steve Farmer

Reputation: 35

Counting most common used words in table, filtering out stop words

I created a table that is populated by the responses people stated as the first thing that came into their mind when they viewed a photo. I have ~1400 entries. Now, I want to see what is the most common description.

CREATE TABLE descript (
wordID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
wordText TEXT(50)
)
ENGINE=MyISAM;

INSERT INTO descript VALUES(0,"Big");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"fast");
INSERT INTO descript VALUES(0,"impressive");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"big");
INSERT INTO descript VALUES(0,"red");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"His");
INSERT INTO descript VALUES(0,"Black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"black");
INSERT INTO descript VALUES(0,"blue");
INSERT INTO descript VALUES(0,"a black");
INSERT INTO descript VALUES(0,"his");
INSERT INTO descript VALUES(0,"her");
INSERT INTO descript VALUES(0,"pleasant");
INSERT INTO descript VALUES(0,"the fast");
INSERT INTO descript VALUES(0,"blue");

and on and on and on ....

I have to make it so it is lower case, which is done with this:

select LOWER(wordText) descript;

How do I go about having it count the most common answer and display it? I have a few stop words (that I do not want to be included in the count such as 'a" or 'the'. How do I go about having them not counted?

Upvotes: 3

Views: 263

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

The basic query is:

SELECT lower(wordText) as word, count(*)
FROM descript
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;

You can remove stop words using not in, if you want to include them in the query:

SELECT lower(wordText) as word, count(*)
FROM descript
WHERE lower(wordText) not in ('a', 'the', . . . )
GROUP BY lower(wordText)
ORDER BY count(*) DESC
LIMIT 1;

Alternatively, if you have them in a table:

SELECT lower(sw.wordText) as word, count(*)
FROM descript d left join
     stopwords sw
     on d.wordText = sw.word
WHERE sw.word is not null
GROUP BY lower(sw.wordText)
ORDER BY count(*) DESC
LIMIT 1;

You can learn about the stop words included with MySQL here.

Upvotes: 1

nitish
nitish

Reputation: 37

As per getting the most frequent value you can use this query.

   SELECT wordText, count(*) FROM descript GROUP BY wordText  ORDER BY count(*) DESC LIMIT 1;

Upvotes: 0

lukevp
lukevp

Reputation: 715

IF you do a

SELECT COUNT(LOWER(wordText)) FROM descript GROUP BY LOWER(wordText);

you should be able to see how many of each word there are.

You can add an

ORDER BY

clause to arrange them based on the count of each result

Upvotes: 0

Related Questions