iso_9001_
iso_9001_

Reputation: 2859

PostgreSQL Replace Column Data with Unique Integers

I have a table of many columns in Postgresql database. Some of these columns are in text type and have several rows of values. The values also recur. I would like to change these text values with unique integer values.

This is my table column:

Country_Name
------------
USA
Japan
Mexico
USA
USA
Japan
England

and the new column I want is:

Country_Name
------------
1
2
3
1
1
2
4

Each country name is assigned (mapped) to a unique integer and all the recurrences of the text is replaced with this number. How can I do this?

Edit 1: I want to replace my column values on the fly if possible. I don't actually need another column to keep the names but it would be nice to see the actual values too. Is it possible to do:

I don't know if this is possible but this will speed up things because I have a total of 220 columns and millions of rows. Thank you.

Upvotes: 0

Views: 775

Answers (1)

Haleemur Ali
Haleemur Ali

Reputation: 28263

assuming the country_name column is in a table called country_data

create a new table & populate with unique country_names

-- valid in pg10 onwards
-- for earlier versions use SERIAL instead in the PK definition
CREATE TABLE countries (
   country_id INT  GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
   country_name TEXT); 
INSERT INTO countries (country_name) 
SELECT DISTINCT country_name
FROM country_data;

alter the table country_data & add a column country_id

ALTER TABLE country_data ADD COLUMN country_id INT

Join country_data to countries and populate the country_id column

UPDATE country_data
SET country_id = s.country_id
FROM countries
WHERE country_data.country_name = countries.country_name

At this point the country_id is available to query, but a few following actions may be recommended depending on the use case:

  • set up country_data.country_id as a foreign key referring to countries.country_id
  • drop the column country_data.country_name as that's redundant through the relationship with countries
  • maybe create an index on country_data.country_id if you determine that it will speed up the queries you normally run on this table.

Upvotes: 3

Related Questions