Reputation: 22923
So, after the recent Twitch leak everyone is discussing their favorite bits of bad code. One bit that stood out was a single monster SQL statement to spot "illegal terms". That got me thinking about how to implement this "correctly".
So you have a table of strings that you want to match sub-strings against, how you write this in PL/pgSQL? I am assuming any SQL implementation should have procedural capabilities to create functions/procedures for such meta-programming, basically creating and executing SQL like the following:
admin@localhost:words> SELECT 'XabcY' LIKE '%abc%' OR 'XabcY' LIKE '%xyz%' as matches;
+-----------+
| matches |
|-----------|
| True |
+-----------+
So to be more specific, given a list of strings in table disallowed
:
| illegal_string |
|----------------|
| stupid |
| witless |
| moron |
| commie-lover |
How would you create a dynamic query in PL/pgSQL that when executed returned true if any of these matched a given string? It does not need to use ILIKE
in the Twitch like to check if the given word contained the substring, so using position
is fine too, but it should be performant/tuneable using gin
indices and whatnot.
Upvotes: 2
Views: 1754
Reputation: 13049
I do not think that you need PL/pgSQL or dynamic SQL for this. Make a set of text from the table of disallowed words and match against the set items as regular expressions. Maybe not very performant using regexp magic, but I hope straightforward. The query below can of course be parameterized.
select '<text to examine>' ~*
any(select illegal_string from disallowed) as rude;
select 'You Stupido MORONE!' ~*
any(select illegal_string from disallowed) as rude;
-- yields true.
You may wish to restrict the search to whole words only. Then play with the regexp set and shape it like this:
select 'You Stupido MORONE!' ~*
any(select '\m'||illegal_string||'\M' from disallowed) as rude;
-- yields false
SQL Fiddle here
Upvotes: 1
Reputation: 45795
There are two possibilities:
LIKE ANY(array)
operatorpostgres=# select 'Ahoj' like any (ARRAY['Ah%', 'Na%']);
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
postgres=# select 'Nazdar' like any (ARRAY['Ah%', 'Na%']);
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
postgres=# select 'Ahoj' ~ '^(Ah|Na)';
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
postgres=# select 'Nazdar' ~ '^(Ah|Na)';
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
So in the end you don't need dynamic SQL.
There are ANSI/SQL syntax too:
postgres=# select 'Nazdar' similar to '(Ah|Na)%';
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
So you can write something like:
DECLARE pw text[];
BEGIN
pw := (SELECT array_agg('%' || disallowed || '%'
FROM disallowed);
IF EXISTS(SELECT * FROM foo WHERE c LIKE ANY (pw)) THEN
RAISE NOTICE 'there are some disallowed words';
END IF;
...
I am not sure about an performance. On larger table you need trigram index, or maybe better is using fulltext instead substring searching.
Upvotes: 1