Adam
Adam

Reputation: 20962

Mysql + count all words in a Column

I have 2 columns in a table and I would like to roughly report on the total number of words. Is it possible to run a MySQL query and find out the total number of words down a column.

It would basically be any text separated by a space or multiple space. Doesn't need to be 100% accurate as its just a general guide.

Is this possible?

Upvotes: 6

Views: 15046

Answers (4)

Rick Hoving
Rick Hoving

Reputation: 3575

Try something like this:

SELECT COUNT(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1)
FROM table

This will count the number of caracters in your column, and substracts the number of caracters in your column removing all the spaces. Hereby you know how many spaces you have in your row and hereby know how many words there are (roughly because you can also type in a double space, this wil count as two words but you say you want it roughly so this should suffice).

Upvotes: 17

FanoFN
FanoFN

Reputation: 7124

I stumbled upon this post while I was looking for an answer myself and truthfully I've tested all of the answers here and the closest one was @fikre's answer. However, I have concern over data that have leading spaces and/or extra spaces between the words (trailing spaces doesn't seem to have effect to fikre's query during my testing). So, I'm looking for a way to identify any spaces in between words and remove them. While I found a few answers using advanced function (which is beyond my skill set), I did find a very simple way to do it.

tl;dr > @fikre's answer is the only one working for me but I did a minor tweak to ensure that I'll get the most accurate word count.

Query 1 -- This will return 5 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" FROM
(SELECT TRIM(REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ')) AS input
FROM (SELECT ' too   late  to the     party ' AS input) i) r;

Query 2 -- This will return 13 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" 
FROM (SELECT ' too   late  to the     party ' AS input) i;
-- breakdown ' too   late  to the     party '
   1 leading space= 1 word count
   2 spaces after the first space from the word 'too'= 2 word count
   1 space after the first space from the word 'late'= 1 word count
   4 spaces after the first space from the word 'the'= 4 word count
   trailing space(s) wasn't counted at all.
   Total spaces > 1+2+1+4=8 + 5 word count = 13 

So, basically if the data row contains even a million spaces in between (disclaimer: an assumption. I've only tested 336,896 spaces), Query 1 will still return Word count=5.

Note: The mid part REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ') I took from this answer https://stackoverflow.com/a/55476224/10910692

Upvotes: 0

fikre
fikre

Reputation: 161

Count simply gives you the number of found rows. You need to use SUM instead.

SELECT SUM(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1) FROM table

Upvotes: 16

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115660

A less rough count:

SELECT LENGTH(column) - LENGTH(REPLACE(column, SPACE(1), '')) 
FROM
  ( SELECT CONCAT(TRIM(column), SPACE(1)) AS column
    FROM
      ( SELECT REPLACE(column, SPACE(2), SPACE(1)) AS column
        FROM 
          ( SELECT REPLACE(column, SPACE(3), SPACE(1)) AS column
            FROM 
              ( SELECT REPLACE(column, SPACE(5), SPACE(1)) AS column
                FROM 
                  ( SELECT REPLACE(column, SPACE(9), SPACE(1)) AS column
                    FROM 
                      ( SELECT REPLACE(column, SPACE(17), SPACE(1)) AS column
                        FROM 
                          ( SELECT REPLACE(column, SPACE(33), SPACE(1)) AS column
                            FROM tableX
                          ) AS x
                      ) AS x
                  ) AS x
              ) AS x
          ) AS x
      ) AS x
  ) AS x 

Upvotes: 1

Related Questions