Reputation: 877
I trying to get counts of words to generate a word cloud type view from my PostgreSQL db. I was hoping to take advantage of the built in efficiencies of ts_vectors. I see that there is a ts_stats function that returns a list of lexemes and their frequencies, given a ts_vector column.
The rows returned look something like this:
word ndoc nentry
studi 322 43434
compani 233 2322
And the query looks like this:
SELECT * FROM ts_stat('SELECT tsv from doc')
ORDER BY nentry DESC, ndoc DESC, word
LIMIT 10
However, I need the full word, not just the lexemes. Is there a way to convert a lexeme to one of its full-word representations with PostgreSQL? You can easily convert a word to its lexeme, but I can't figure out how to do the reverse.
Thanks.
Upvotes: 4
Views: 3662
Reputation: 1551
It's not possible. You can store the initial text into a separate column.
A lexeme is a normalized representation (using dictionary or stemmer) of a word, so it can match several forms: lexeme 'walk'
matches 'walks'
and also 'walking'
, so the original form cannot be determined.
Upvotes: -1
Reputation: 39
Change the configuration of your ts_vector to 'simple'. the default is 'english' which does the stemming e.g.
SELECT * FROM ts_stat('SELECT to_tsvector(''freds company company company hello ridiculous'')');
outputs counts for ridicul, compani, hello, fred
with the 'simple' configuration no stemming occurs
SELECT * FROM ts_stat('SELECT to_tsvector(''simple'',''freds company company company hello ridiculous'')');
outputs counts for : ridiculous,hello,freds,company
Upvotes: 3