Jeff Kyzer
Jeff Kyzer

Reputation: 646

Delete row if type cast fails

Ok, here is the layout:
I have a bunch of uuid data that is in varchar format. I know uuid is its own type. This is how I got the data. So to verify this which ones are uuid, I take the uuid in type varchar and insert it into a table where the column is uuid. If the insert fails, then it is not a uuid type. My basic question is how to delete the bad uuid if the insert fails. Or, how do I delete out of one table if an insert fails in another table.
My first set of data:

drop table if exists temp1;
drop sequence if exists temp1_id_seq;
CREATE temp table temp1 (id serial, some_value varchar);
INSERT INTO temp1(some_value) 
SELECT split_part(name,':',2) FROM branding_resource WHERE name LIKE '%curric%';
create temp table temp2 (id serial, other_value uuid);

CREATE OR REPLACE function verify_uuid() returns varchar AS $$
DECLARE uu RECORD;
BEGIN
FOR uu IN  select * from temp1 
LOOP
    EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
END LOOP;
END;
$$
LANGUAGE 'plpgsql' ;
select verify_uuid();

When I run this, I get the error

ERROR: invalid input syntax for uuid:

which is what I expect. There are some bad uuids in my data set.

My research led me to Trapping Errors - Exceptions with UPDATE/INSERT in the docs.
Narrowing down to the important part:

BEGIN
FOR uu IN  select * from temp1 
LOOP
    begin
    EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
    return;
    exception when ??? then delete from temp1 where some_value = uu.some_value;
    end;
END LOOP;
END;

I do not know what to put instead of ???. I think it relates to the ERROR: invalid input syntax for uuid:, but I am not sure. I am actually not even sure if this is the right way to go about this?

Upvotes: 2

Views: 981

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657022

@Craig shows a way to identify the SQLSTATE.
You an also use pgAdmin, which shows the SQLSTATE by default:

SELECT some_value::uuid FROM temp1

> ERROR: invalid input syntax for uuid: "-a0eebc999c0b4ef8bb6d6bb9bd380a11"
> SQL state: 22P02

I am going to address the bigger question:

I am actually not even sure if this is the right way to go about this?

Your basic approach is the right way: the 'parking in new york' method (quoting Merlin Moncure in this thread on pgsql-general). But the procedure is needlessly expensive. Probably much faster:

Exclude obviously violating strings immediately.

You should be able to weed out the lion's share of violating strings with a much cheaper regexp test.
Postgres accepts a couple of different formats for UUID in text representation, but as far as I can tell, this character class should covers all valid characters:

'[^A-Fa-f0-9{}-]'

You can probably narrow it down further for your particular brand of UUID representation (Only lower case? No curly braces? No hyphen?).

CREATE TEMP TABLE temp1 (id serial, some_value text);

INSERT INTO temp1 (some_value) 
SELECT split_part(name,':',2)
FROM   branding_resource
WHERE  name LIKE '%curric%'
AND    split_part(name,':',2) !~ '[^A-Fa-f0-9{}-]';

"Does not contain illegal characters."

Cast to test the rest

Instead of filling another table, it should be much cheaper to just delete (the now few!) violating rows:

CREATE OR REPLACE function f_kill_bad_uuid()
  RETURNS void AS
$func$
DECLARE
   rec record;
BEGIN
FOR rec IN
   SELECT * FROM temp1 
LOOP
   BEGIN
   PERFORM rec.some_value::uuid;             -- no dynamic SQL needed
                                             -- do not RETURN! Keep looping.
   RAISE NOTICE 'Good: %', rec.some_value;   -- only for demo
   EXCEPTION WHEN invalid_text_representation THEN
      RAISE NOTICE 'Bad: %', rec.some_value; -- only for demo
      DELETE FROM temp1 WHERE some_value = rec.some_value;
   END;
END LOOP;
END
$func$ LANGUAGE plpgsql;
  • No need for dynamic SQL. Just cast. Use PERFORM, since we are not interested in the result. We just want to see if the cast goes through or not.

  • Not return value. You could count and return the number of excluded rows ...

  • For a one-time operation you could also use a DO statement.

  • And do not quote the language name 'plpgsql'. It's an identifier, not a string.

SQL Fiddle.

Upvotes: 1

Craig Ringer
Craig Ringer

Reputation: 324521

You can get the SQLSTATE code from psql using VERBOSE mode, e.g:

regress=> \set VERBOSITY verbose
regress=> SELECT 'fred'::uuid;
ERROR:  22P02: invalid input syntax for uuid: "fred"
LINE 1: SELECT 'fred'::uuid;
               ^
LOCATION:  string_to_uuid, uuid.c:129

Here we can see that the SQLSTATE is 22P02. You can use that directly in the exception clause, but it's generally more readable to look it up in the manual to find the text representation. Here, we see that 22P02 is invalid_text_representation.

So you can write exception when invalid_text_representation then ...

Upvotes: 2

Related Questions