Reputation: 775
Input
Word-Word, Some other words and this is another word et another one
Expected output
Word-Word
Some
other
words
this
is
another
word
another
one
I have a table (t) with many strings like the one showed in the input.
I'm trying to get every word in the sentence but the comas (','), the word 'and', 'et', 'und' and of course every whitespace or sequence of whitespace that may be between words.
Regex that I'm using:
\\s+
((\b[^\s]+\b)((?<=\.\w).)?)
- doesn't work in postgres for some reason^(?!et$|and$|und$)
- doesn't work eitherQuery that I'm running
SELECT word FROM t,
unnest(regexp_split_to_array(t.word, E'Missing expression')) as word;
Upvotes: 1
Views: 1374
Reputation: 626689
You can use an extracting approach here in the following way:
SELECT regexp_matches(
'Word-Word, Some other words and this is another word et another one ',
E'\\y(?!(?:et|[ua]nd)\\y)\\w+(?:-\\w+)*',
'g');
See the online demo. Regex details:
\y
- a word boundary(?!(?:et|[ua]nd)\y)
- a negative lookahead that fails the match if there is et
, und
or and
as whole words immediately to the right of the current location\w+(?:-\w+)*
- one or more word chars and then zero or more occurrences of -
and one or more word char sequencesSee the regex demo (converted to PCRE).
Upvotes: 1