tim_xyz
tim_xyz

Reputation: 13511

Postgresql: exclude records where a specific character in text is not prefaced by whitespace

I'd like to exclude records that have a ">" with no whitespace directly infront of the > character.

So return records like:

" >"

"I worked on deals >50MM"

But not like:

">"

"I like accoun>ng"

While parsing and saving PDFs, some characters we're incorrectly converted into > and I'd like to skip these records, but not records where > was legimately used. Legitimate cases generally have a whitespace, " " infront of the ">".

Current (but problematic) solution:

select 
    id, content
from uploads 
where (
    content ilike '%>%' and
    content ilike '% >%'
) or (
    content not ilike '%>%'
)

The problem with this solution is that it will select records that have both ">" and " >". But I'd like to exclude everything missing a whitespace in front of ">".

Hopefully this question is not too confusing.

Upvotes: 1

Views: 841

Answers (2)

tonypdmtr
tonypdmtr

Reputation: 3225

Another possibility:

select id,content from uploads
  where id not in (select id from uploads where content like '%>%'
                   except
                   select id from uploads where content like '% >%')

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270431

Just use regular expressions:

where not content ~ ' >'

or perhaps:

where content ~ '[^ ]>'

Upvotes: 1

Related Questions