Hristian Yordanov
Hristian Yordanov

Reputation: 668

Import CSV in postgresql without duplicate rows

My query duplicates rows when I import CSV file in to table:

+----------+----------------+---------------------+
| fullname | formattedvalue | recordTime          |
+----------+----------------+---------------------+
| text1    | 170.01346      | 09/02/2020 21:45:00 |
+----------+----------------+---------------------+
| text2    | 24.153432536   | 09/02/2020 21:45:00 |
+----------+----------------+---------------------+
| text3    | 3.583432424    | 09/02/2020 21:45:00 |
+----------+----------------+---------------------+
| text1    | 170.01346      | 08/02/2020 21:45:00 |
+----------+----------------+---------------------+
| text2    | 24.153432536   | 08/02/2020 21:45:00 |
+----------+----------------+---------------------+
| text3    | 3.583432424    | 08/02/2020 21:45:00 |
+----------+----------------+---------------------+

And the Query:

CREATE TEMP TABLE tmp_x
(
"fullname" varchar,
"formattedvalue" double precision,
"recordtime" timestamp
);

COPY tmp_x FROM PROGRAM 'more +1 "D:\MEAS_20200308x.csv"' (FORMAT csv, DELIMITER ',');

--UPDATE tmp_x
--SET    formattedvalue = ROUND( CAST(formattedvalue as numeric), 3 );

insert into meas_kanal select * from (
select x.*
from tmp_x x
left outer join meas_kanal t on t.fullname = x. fullname AND t. recordtime = x. recordtime
where t. fullname is null AND  t. recordtime is null
) as missing;


DROP TABLE tmp_x;

My logic is to check for duplicates on cloumn combination: fullname + recordtime

When I start the query again, it is inserting the same rows again.

Any idea where I am wrong?

EDIT 2:

I tried and this with the same problem:

INSERT INTO meas_kanal
SELECT x.*
FROM tmp_x x
    LEFT OUTER JOIN meas_kanal t ON (t. fullname = x. fullname AND t. recordtime = x. recordtime)
WHERE t.fullname IS NULL AND t. recordtime IS NULL;

EDIT 3: One more fail.

INSERT INTO meas_kanal
SELECT *
FROM tmp_x
WHERE NOT EXISTS(SELECT * 
                 FROM meas_kanal 
                 WHERE (tmp_x.fullname=meas_kanal.fullname and
                       tmp_x.recordtime=meas_kanal.recordtime)
                 );

I think that the problem is somewhere else.

Edit 4: Possible solution

BTW I forgot to mention. I don't have a primary key. Now I make two:

CREATE TEMP TABLE tmp_x
(
"fullname" varchar,
"formattedvalue" double precision,
"recordtime" timestamp,
UNIQUE (fullname, recordtime)
);

And this insert:

insert into meas_kanal(fullname, formattedvalue,recordtime) 
    SELECT fullname, formattedvalue,recordtime FROM tmp_x x
ON CONFLICT DO NOTHING;

For now it's working like I expect. I will write this solution as an answer if no one gives a better solution.

Upvotes: 0

Views: 1214

Answers (2)

Hristian Yordanov
Hristian Yordanov

Reputation: 668

I make two primary keys:

CREATE TEMP TABLE tmp_x
(
"fullname" varchar,
"formattedvalue" double precision,
"recordtime" timestamp,
UNIQUE (fullname, recordtime)
);

And this insert:

insert into meas_kanal(fullname, formattedvalue,recordtime) 
    SELECT fullname, formattedvalue,recordtime FROM tmp_x x
ON CONFLICT DO NOTHING;

For now it's working like I expect.

Upvotes: 2

SILENT
SILENT

Reputation: 4268

You could use GROUP BY or DISTINCT

Assuming your current query results with the table you displayed

INSERT INTO meas_kanal
SELECT DISTINCT x.fullname, x.formattedvalue, x.recordtime
FROM tmp_x x

Update 1

Based on your recent comments, if you want the latest recordtime to be inserted, then use GROUP BY

INSERT INTO meas_kanal
SELECT x.fullname, x.formattedvalue, MAX(x.recordtime) recordtime
FROM tmp_x x
GROUP BY x.fullname, x.formattedvalue

Upvotes: 0

Related Questions