Reputation: 10992
I have a big CSV file with all columns quoted with "
.
Null values are represented as ""
, and column separator is |
.
I would like to use the COPY
command from postgresql (version 10.7) to ingest those files.
I have tried many combination, but the more natural to me is the following:
COPY test.large
FROM '/path/to/big.file'
WITH (
FORMAT CSV,
HEADER,
DELIMITER '|',
QUOTE '"',
NULL ''
);
My underlying table expects a column for integers and there are lines where NULL
value are set to ...|""|...
instead of some number ...|"123456"|...
. And unfortunately, this makes COPY
to crash, saying:
ERREUR: syntaxe en entrée invalide pour l'entier : « »
CONTEXT: COPY regpat_pct_app_reg, ligne 2743, colonne appid : « »
Sorry, it is in French from this terminal. Anyway, it says: Invalid syntax for integer : « »
At the line 2743 we found:
...000205"|""|"XY...
Which is a NULL
value, but I cannot found how to properly setup COPY
command switch to make postgresql ingest those files.
The documentation says:
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
I am aware that I can clean the file by changing ...|""|...
to ...||...
using sed
or some regexp/replace operations. This will solve my problem, I have checked it.
What I am wondering: is it possible to bear it with postgresql, after all this is perfectly valid CSV format.
Update
Reading the following post as suggested, I wrote:
COPY test.large
FROM '/path/to/big.file'
WITH (
FORMAT CSV,
HEADER,
DELIMITER '|',
QUOTE '"',
NULL '',
FORCE_NULL appid
);
I get:
ERREUR: l'argument de l'option « force_null » doit être une liste de noms de colonnes
Which translates to force_null must be a list of column names
. Any idea?
Upvotes: 0
Views: 4236
Reputation: 10992
It seems it is a slight inconsistency in the COPY
command switches, because the following call:
COPY test.large
FROM '/path/to/big.file'
WITH CSV HEADER DELIMITER '|' QUOTE '"' FORCE NULL appid;
Works as expected or you must add the _
if you use the key value pair notation:
COPY test.large
FROM '/path/to/big.file'
WITH (
FORMAT CSV,
HEADER,
DELIMITER '|',
QUOTE '"',
FORCE_NULL (appln_id)
);
Upvotes: 2
Reputation: 74605
This isn't NULL, it's an empty string which is a different thing. I'm not aware of a way to make PG treat an empty string as a null number. I recommend you do your import to a temporary table with a text column instead of an integer here, then move the data to the main table converting to number as appropriate
Take a look at this Q: Postgresql COPY empty string as NULL not work
Very similar to your scenario and the accepted answer there describes a similar technique
Upvotes: 1