raddevon
raddevon

Reputation: 3340

How can I change all occurrences of a particular value in any column in PostgreSQL?

I have three different values in my database that represent a null: an actual null, an empty string, and a string {x:Null}. This value appears across multiple columns.

{x:Null} is normalized on the web front-end, so all these values look exactly the same although they end up ordered differently in a sort. How can I write a query that will take these values and make them actual nulls across every column and every table?

Bonus points if you can tell me how to make sure these other empty values are always inserted as nulls going forward. (Disclaimer: I have no power to grant any actual bonus points. ;)

Upvotes: 0

Views: 2392

Answers (3)

Schwern
Schwern

Reputation: 164829

You can query the information_schema to get a list of all tables and columns with a string type.

SELECT table_name, column_name
FROM   information_schema.columns
WHERE  data_type IN ('text', 'character', 'character varying')

NOTE double check first what values data_type has, I'm not sure if it will be character or char or what.

Then I would write a small program to update each column in each table. Here it is sketched out in Perl.

while( my($table, $column) = $sth->fetch ) {
    my $q_table  = $dbh->quote($table);
    my $q_column = $dbh->quote($column);

    $dbh->do(q[
        UPDATE `$q_table`
        SET    `$q_column` = NULL
        WHERE  `$q_column` = '{x:Null}'
            OR `$q_column` = ''
    ]);
}

Be sure to SQL escape $table and $column as in my sample.

Going forward, you'll have to set CONSTRAINTS on each and every column. You can use the information_schema.columns to do this as well. Something like

ALTER TABLE `$q_table` ADD CHECK(`$q_column` NOT IN ('{x:Null}', ''))

You could use a trigger to change the values to NULL, but I don't like data stores that silently change basic data for application purposes.

For new columns and tables, you'll have to remember to add that constraint. Same caveats about data_type apply.

However, it's probably a bad idea to say that no column can ever be an empty string. You might want to be bit more selective.

Another thing to note: NULL is a funny thing, its not true and its not false. You might be better off deciding that an empty string is the thing to set empty values to.

I don't think this approach is maintainable. It's scribbling an application rule all over the data layer. What if you have some data that doesn't follow that rule? And it will have to be continuously maintained for any new data schema added. Perhaps instead you should put this at your ORM layer. Or write a few stored procedures to take care of this.

Upvotes: 1

juhist
juhist

Reputation: 4314

I don't think there is any query that would do this thing for every table and every column. In principle, what you want to do is

UPDATE table SET column=NULL WHERE column='' OR column='{x:Null}';

You could try selecting data from the pg_attribute and pg_class columns to get the names of the tables and names of the columns and then generating automatically the queries. Be sure to select only those columns that contain textual data.

What if somebody has entered a genuine string '{x:Null}'? You would then change it into NULL.

However, you have done a real mistake by letting the situation to be as bad as it's currently. You should always normalize data before putting it into a database.

Upvotes: 0

Politank-Z
Politank-Z

Reputation: 3719

Using the information_schema.columns table, write a procedural language routine which iterates through all applicable tables and columns, executing an update... set *column* = NULL...where column in ('','{x:Null}'). for each eligible column.

As for inserting these values as NULL going forward, you would have to set triggers on your tables to intercept these values and replace them with NULL.

Upvotes: 1

Related Questions