How to change default identifier {..} for arrays in postgres?

I have large .csv files which include arrays with [..] as identifier. Obviously postgres has default {..} for arrays. How can i change it to import them properly and quickly? I'm not supposed to configure the .csv files. In the below example our array is genres.

Example:

create table title_basics (
    tconst text primary key,
    titleType text not null,
    primaryTitle text not null,
    genres text ARRAY
);


copy title_basics(tconst,titleType,primaryTitle,genres)
from 'C:\....\title_basics.csv' delimiter ',' csv header;

Dataset example:

tconst,titleType,primaryTitle,genres
t0001,movie,Miss Jerry,"[""Action"",""Comedy""]"

Upvotes: 1

Views: 449

Answers (2)

Best Practice:

  1. Take the field as text from .csv
  2. Update the column and replace the stuff you don't want.
  3. With alter table transform your column to table.

Example code:

create table title_basics 
(
    ...
    genres text
);

copy title_basics(tconst,...,genres)
from 'C:...\title_basics.csv' delimiter ',' csv header;

update title_basics set genres=replace(genres,'"','');
update title_basics set genres=replace(genres,'[','');
update title_basics set genres=replace(genres,']','');
alter table title_basics
    alter genres type text[] using string_to_array(genres,',');

Upvotes: 1

Thom Brown
Thom Brown

Reputation: 2029

You can't change the default bracket type for arrays, but you can alter the input before it reaches the database. However, it will then be your responsibility to ensure the method you are using is reliable.

For example:

COPY title_basics(tconst, titletype, primarytitle, genres)
FROM PROGRAM $$powershell -Command "(gc C:\....\title_basics.csv) -replace '\[','{' -replace '\]','}'"$$
DELIMITER ',' CSV HEADER;

Disclosure: I am an EnterpriseDB (EDB) employee.

Upvotes: 0

Related Questions