Wildan Muhlis
Wildan Muhlis

Reputation: 1613

Search text between symbol

I have this text (taken from concatenated field row)

Astronomic Event 2013/1434H - Aceh ....

How do We search it by 2013 or 1434h keywords?

I have tried below code but it resulting no row.

to_tsvector result:

'2013/1434h':8,12 'aceh':1 'bin.....

Sample Case:

WITH sample_table as
  (SELECT to_tsvector('Astronomic Event 2013/1434H - Aceh') sample_content)

SELECT *
    FROM sample_table, to_tsquery('2013') q
WHERE sample_content @@ q

Upvotes: 1

Views: 721

Answers (1)

Denis de Bernardy
Denis de Bernardy

Reputation: 78503

How do We search it by 2013 or 1434h keywords?

It seems like you want to replace:

to_tsquery('1434h') q

with:

to_tsquery('1434h | 2013') q

http://www.postgresql.org/docs/current/static/functions-textsearch.html

Side note: the to_tsquery() syntax is extremely capricious. It doesn't allow for much if any fantasy, and many of the assumptions in Postgres are everything but end-user friendly.

More often than not, you'll be better off using plainto_tsquery(), which allows any amount of garbage to be thrown at it. Thus, consider pre-processing the string before issuing the query. For instance, you could split the string, and OR the original parts together:

where sc.text_index @@ (plainto_tsquery('1434h') || plainto_tsquery('2013'))

Doing so will make your code a bit more complex, but it won't rely on your users needing to understand that (contrary to what they're accustomed to in Google) they should enter 'quick & brown & fox & jumps & lazy & dog' instead of plain 'The quick brown fox jumps over the lazy dog'.


Edit: I ended up actually trying your sample query, and it seems you're actually running into a parser issue:

# SELECT alias, description, token FROM ts_debug('Astronomic Event 2013/1434H - Aceh');
   alias   |    description    |   token    
-----------+-------------------+------------
 asciiword | Word, all ASCII   | Astronomic
 blank     | Space symbols     |  
 asciiword | Word, all ASCII   | Event
 blank     | Space symbols     |  
 file      | File or path name | 2013/1434H
 blank     | Space symbols     |  
 blank     | Space symbols     | - 
 asciiword | Word, all ASCII   | Aceh
(8 rows)

http://www.postgresql.org/docs/current/static/textsearch-parsers.html

It looks like you might need to write (or find) and configure an app-specific parser. Having never done this personally, the best I can do is to highlight that Postgres allows this and includes a sample:

http://www.postgresql.org/docs/current/static/test-parser.html

Alternatively, change your tsvector-related trigger so that it matches e.g. \d{4}/\d+[a-zA-Z] or whatever seems most appropriate, and adds spaces accordingly, before converting it to a tsvector. Something as simple as the following might do the trick if you never need to store file names:

SELECT alias, description, token
FROM ts_debug(replace('Astronomic Event 2013/1434H - Aceh', '/', ' / '));

Upvotes: 1

Related Questions