Reputation: 11
I have a column that stores the bio/title of users. It's custom written by users and can have as many words.
id title
1 Business Development Executive Cold Calling & Cold Emailing expert Entrepreneur
2 Director of Online Marketing and entrepreneur
3 Art Director and Entrepreneur
4 Corporate Development at Yahoo!
5 Snr Program Manager, Yahoo
I am trying to figure out a mysql query that shows word frequencies:
Entrepreneur 3
development 2
director 2
I know if I could return each word in the value as a separate row, I could then use normal grouping. I have looked, but can't find, a function that splits text into words each in a separate row.
Can it be done?
Upvotes: 1
Views: 482
Reputation: 425208
You can do it by joining with a manufactured number series that is used to pick out the nth word. Unfortunately, mysql has no built-in method if generating a series, so it's a little ugly, but here it is:
select
substring_index(substring_index(title, ' ', num), ' ', -1) word,
count(*) count
from job j
join (select 1 num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 10 union select 11 union select 12) n
on length(title) >= length(replace(title, ' ', '')) + num - 1
group by 1
order by 2 desc
See a live demo on SQLFiddle using your data and producing your expected output.
Sadly, the limitation of having to hard code every value of the number series also limits the number of words of the column that will be processed (in this case 12). It doesn't matter if there are too many numbers in the series, and you can always add more to cover a larger expected input text.
Upvotes: 4
Reputation: 335
Try selecting all job titles and returning it as an array. Then do something like this in php:
<?php
$array = array("Business Development Executive Cold Calling & Cold Emailing expert Entrepreneur ", "Director of Online Marketing and entrepreneur", "Art Director and Entrepreneur", "Corporate Development at Yahoo!", "Snr Program Manager, Yahoo");
$words = "";
foreach($array as $val) $words .= " ".strtolower($val);
print_r(array_count_values(str_word_count($words, 1)));
?>
Will output:
Array ( [business] => 1 [development] => 2 [executive] => 1 [cold] => 2 [calling] => 1 [emailing] => 1 [expert] => 1 [entrepreneur] => 3 [director] => 2 [of] => 1 [online] => 1 [marketing] => 1 [and] => 2 [art] => 1 [corporate] => 1 [at] => 1 [yahoo] => 2 [snr] => 1 [program] => 1 [manager] => 1 )
Upvotes: 0