Reputation: 2859
I have a table of many columns in Postgresql
database. Some of these columns are in text type and have several rows of values. The values also recur. I would like to change these text values with unique integer values.
This is my table column:
Country_Name
------------
USA
Japan
Mexico
USA
USA
Japan
England
and the new column I want is:
Country_Name
------------
1
2
3
1
1
2
4
Each country name is assigned (mapped) to a unique integer and all the recurrences of the text is replaced with this number. How can I do this?
Edit 1: I want to replace my column values on the fly if possible. I don't actually need another column to keep the names but it would be nice to see the actual values too. Is it possible to do:
country_id
with the same values of country_name
column in the same tablecountry_id
replace each name with a unique integer with an update statement or procedure without requiring a new table or dictionary or map.I don't know if this is possible but this will speed up things because I have a total of 220 columns and millions of rows. Thank you.
Upvotes: 0
Views: 775
Reputation: 28263
assuming the country_name
column is in a table called country_data
create a new table & populate with unique country_names
-- valid in pg10 onwards
-- for earlier versions use SERIAL instead in the PK definition
CREATE TABLE countries (
country_id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
country_name TEXT);
INSERT INTO countries (country_name)
SELECT DISTINCT country_name
FROM country_data;
alter the table country_data
& add a column country_id
ALTER TABLE country_data ADD COLUMN country_id INT
Join country_data
to countries
and populate the country_id
column
UPDATE country_data
SET country_id = s.country_id
FROM countries
WHERE country_data.country_name = countries.country_name
At this point the country_id
is available to query, but a few following actions may be recommended depending on the use case:
country_data.country_id
as a foreign key referring to countries.country_id
country_data.country_name
as that's redundant through the relationship with countries
country_data.country_id
if you determine that it will speed up the queries you normally run on this table.Upvotes: 3