jlandercy
jlandercy

Reputation: 10992

Quoted NULL values makes PostgreSQL COPY command fail

I have a big CSV file with all columns quoted with ". Null values are represented as "", and column separator is |. I would like to use the COPYcommand from postgresql (version 10.7) to ingest those files. I have tried many combination, but the more natural to me is the following:

COPY test.large
FROM '/path/to/big.file'
WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER '|',
    QUOTE '"',
    NULL ''
);

My underlying table expects a column for integers and there are lines where NULL value are set to ...|""|... instead of some number ...|"123456"|.... And unfortunately, this makes COPY to crash, saying:

ERREUR:  syntaxe en entrée invalide pour l'entier : «  »
CONTEXT:  COPY regpat_pct_app_reg, ligne 2743, colonne appid : «  »

Sorry, it is in French from this terminal. Anyway, it says: Invalid syntax for integer : « » At the line 2743 we found:

...000205"|""|"XY...

Which is a NULL value, but I cannot found how to properly setup COPY command switch to make postgresql ingest those files.

The documentation says:

NULL

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.

I am aware that I can clean the file by changing ...|""|... to ...||... using sed or some regexp/replace operations. This will solve my problem, I have checked it.

What I am wondering: is it possible to bear it with postgresql, after all this is perfectly valid CSV format.

Update

Reading the following post as suggested, I wrote:

COPY test.large
FROM '/path/to/big.file'
WITH (
        FORMAT CSV,
        HEADER,
        DELIMITER '|',
        QUOTE '"',
        NULL '',
        FORCE_NULL appid
    );

I get:

ERREUR:  l'argument de l'option « force_null » doit être une liste de noms de colonnes

Which translates to force_null must be a list of column names. Any idea?

Upvotes: 0

Views: 4236

Answers (2)

jlandercy
jlandercy

Reputation: 10992

It seems it is a slight inconsistency in the COPY command switches, because the following call:

COPY test.large
FROM '/path/to/big.file'
WITH CSV HEADER DELIMITER '|' QUOTE '"' FORCE NULL appid;

Works as expected or you must add the _ if you use the key value pair notation:

COPY test.large
FROM '/path/to/big.file'
WITH (
    FORMAT CSV,
    HEADER,
    DELIMITER '|',
    QUOTE '"',
    FORCE_NULL (appln_id)
);

Upvotes: 2

Caius Jard
Caius Jard

Reputation: 74605

This isn't NULL, it's an empty string which is a different thing. I'm not aware of a way to make PG treat an empty string as a null number. I recommend you do your import to a temporary table with a text column instead of an integer here, then move the data to the main table converting to number as appropriate

Take a look at this Q: Postgresql COPY empty string as NULL not work

Very similar to your scenario and the accepted answer there describes a similar technique

Upvotes: 1

Related Questions