Nitin
Nitin

Reputation: 35

PostgreSQL: Search partial words in specific positions

I am trying to locate the name of a song with some distinct particulars. These are the criteria:

  1. Song name is made up of 4 words
  2. The first alphabet of each word is known (B% A% S% H%)

I have the Musicbrainz data as a postgres database which has a huge list of songs that I can query. I've tried to query the data several ways but am not getting the results that I want.

For example, the code sample below:

select name from track where name ilike 'b% a% s% h%'

will get me halfway there but it does not respect the fact that I am looking for:

  1. a song title that is exactly 4 words
  2. the first alphabet of each word of the song must match the known data (B% A% S% H%)

The query above, which I know is wrong, pulls up things like:

Brothers in Arms

Body and Soul

The first one is wrong because in does not satisfy the requirement that the second word needs to begin with an "A" and there needs to be a total of 4 words.

The second one is wrong because it does not satisfy the need that there needs to be a total of 4 words, even though the first 3 are a good match.

Any pointers are appreciated!

Upvotes: 0

Views: 133

Answers (1)

sticky bit
sticky bit

Reputation: 37497

You can have a look into regular expressions. Assuming that the word boundary can only be a single space (which seems reasonable for song names), you can try:

...
WHERE name ~* '^b[^ ]* a[^ ]* s[^ ]* h[^ ]*$'
...

The first ^ matches the beginning of the string, the $ its end. [^ ] is a character class, that means any character other than a space -- ^ actually "negates" the class or, more accurate, means its complement. That is, it matches any character of the class. * is a quantifier that means that there can be an arbitrary amount of the characters defined by the class, including zero.

If there can be other white space characters than space itself of sequences of more than one space, try:

...
WHERE name ~* '^b\S*\s+a\S*\s+s\S*\s+h\S*$'
...

\S and \s are again character classes but predefined ones. \S means any character unless it's a white space character and \s any white space charcter.

Upvotes: 1

Related Questions