GeoBeez
GeoBeez

Reputation: 1014

how to remove my stop words from a string column in postgresql

I have a table with a string column. I want to remove the stop words. I used this query which seems Ok.

SELECT to_tsvector('english',colName)from tblName order by colName asc;
  1. it does not update the column in table

  2. I want to see the stop words of Postgresql and what the query found.Then in case I can replace it with my own file. I also checked this address and could not find the stop words list file. Actually, the address does not exist.

    $SHAREDIR/tsearch_data/english.stop
    

Upvotes: 1

Views: 4972

Answers (2)

Tuấn Phạm Văn
Tuấn Phạm Văn

Reputation: 1

The chosen answer did not match my requirement, but I found a solution for this:

SELECT regexp_replace('Bill and Susi, hand over or die!', '[^ ]*$','');

regexp_replace
-----------------------------
Bill and Susi, hand over or 
(1 row)

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 246848

There is no function to do that.

You could use something like this (in this example in German):

SELECT array_to_string(tsvector_to_array(to_tsvector('Hallo, Bill und Susi!')), ' ');
 array_to_string
-----------------
 bill hallo susi
(1 row)

This removes stop words, but also stems and non-words, and it does not care about word order, so I doubt that the result will make you happy.

If that doesn't fit the bill, you can use regexp_replace like this:

SELECT regexp_replace('Bill and Susi, hand over or die!', '\y(and|or|if)\y', '', 'g');
       regexp_replace
-----------------------------
 Bill  Susi, hand over  die!
(1 row)

But that requires that you include your list of stop words in the query string. An improved version would store the stop words in a table.

Upvotes: 3

Related Questions