Reputation: 45
I have a table with a random auto-generated id (primary key). I am trying to avoid the insertion of duplicate rows.
Example of a duplicate row:
id | field a | field b | field c |
1 4 6 7
2 4 6 7
The key (id) is not duplicate since it is generated with uuid, but all other fields are identical.
I guess I'm looking for somehting like this but in BigQuery language: Avoiding inserting duplicate rows in mySQL
Upvotes: 1
Views: 2392
Reputation: 7744
You can use insert into ... where not exists ...
query, and it is fine if you do it rarely. But it is kind of anti-pattern if you do it often.
This query needs to scan the table the row is inserted into, so it might get slow and expensive as this table becomes larger. Partitioning and clustering might help, but still if you insert a lot of rows one at a time, this might get costly.
A more common approach is to insert anything, and periodically do deduplication.
Upvotes: 2
Reputation: 1270091
You can use not exists
if you want to avoid inserting duplicate rows into the table:
insert into t (id, a, b, c)
select i.*
from (select 2 as id, 4 as a, 6 as b, 7 as c) i
where not exists (select 1
from t
where t.a = i.a and t.b = i.b and t.c = i.c
);
Upvotes: 1
Reputation: 81
To help protect your table against duplication, set the insertId property when sending your request.
BigQuery uses the insertId property for de-duplication.
new BigQueryInsertRow(insertId: "row1"){
...
},
new BigQueryInsertRow(insertId: "row2") {
...
}
Upvotes: -1