Reputation: 20962
I have 2 columns in a table and I would like to roughly report on the total number of words. Is it possible to run a MySQL query and find out the total number of words down a column.
It would basically be any text separated by a space or multiple space. Doesn't need to be 100% accurate as its just a general guide.
Is this possible?
Upvotes: 6
Views: 15046
Reputation: 3575
Try something like this:
SELECT COUNT(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1)
FROM table
This will count the number of caracters in your column, and substracts the number of caracters in your column removing all the spaces. Hereby you know how many spaces you have in your row and hereby know how many words there are (roughly because you can also type in a double space, this wil count as two words but you say you want it roughly so this should suffice).
Upvotes: 17
Reputation: 7124
I stumbled upon this post while I was looking for an answer myself and truthfully I've tested all of the answers here and the closest one was @fikre's answer. However, I have concern over data that have leading spaces and/or extra spaces between the words (trailing spaces doesn't seem to have effect to fikre's query during my testing). So, I'm looking for a way to identify any spaces in between words and remove them. While I found a few answers using advanced function (which is beyond my skill set), I did find a very simple way to do it.
tl;dr > @fikre's answer is the only one working for me but I did a minor tweak to ensure that I'll get the most accurate word count.
Query 1 -- This will return 5 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" FROM
(SELECT TRIM(REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ')) AS input
FROM (SELECT ' too late to the party ' AS input) i) r;
Query 2 -- This will return 13 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count"
FROM (SELECT ' too late to the party ' AS input) i;
-- breakdown ' too late to the party '
1 leading space= 1 word count
2 spaces after the first space from the word 'too'= 2 word count
1 space after the first space from the word 'late'= 1 word count
4 spaces after the first space from the word 'the'= 4 word count
trailing space(s) wasn't counted at all.
Total spaces > 1+2+1+4=8 + 5 word count = 13
So, basically if the data row contains even a million spaces in between (disclaimer: an assumption. I've only tested 336,896 spaces), Query 1
will still return Word count=5.
Note: The mid part REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ')
I took from this answer https://stackoverflow.com/a/55476224/10910692
Upvotes: 0
Reputation: 161
Count simply gives you the number of found rows. You need to use SUM instead.
SELECT SUM(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1) FROM table
Upvotes: 16
Reputation: 115660
A less rough count:
SELECT LENGTH(column) - LENGTH(REPLACE(column, SPACE(1), ''))
FROM
( SELECT CONCAT(TRIM(column), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(2), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(3), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(5), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(9), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(17), SPACE(1)) AS column
FROM
( SELECT REPLACE(column, SPACE(33), SPACE(1)) AS column
FROM tableX
) AS x
) AS x
) AS x
) AS x
) AS x
) AS x
) AS x
Upvotes: 1