Reputation: 7805
I have a BigQuery table containing data as below:
date hits_eventInfo_Category hits_eventInfo_Action session_id user_id hits_time hits_eventInfo_Label
20151021 Air Search 1445001 A232 1952 CurrentLocation
20151021 Air Search 1445001 A232 1952 CurrentLocation
20151021 Air Search 1445001 A232 1952 CurrentLocation
20151021 Air Select 1445001 A232 7380 Vendor
20151021 Air Select 1445001 A232 7380 Vendor
20151021 Air Select 1445001 A232 7380 Vendor
As you can see there are a series of duplicate records. I would like to end up with one of the duplicate records from each of the duplicate record sets. For example:
date hits_eventInfo_Category hits_eventInfo_Action session_id user_id hits_time hits_eventInfo_Label
20151021 Air Search 1445001 A232 1952 CurrentLocation
20151021 Air Select 1445001 A232 7380 Vendor
How would I do this?
Thanks in advance!
Upvotes: 1
Views: 724
Reputation: 512
You could either use a DISTINCT clause, or group your data. These will aggregate the data returned into a single line for each unique entry.
SELECT DISTINCT [date], [hits_eventInfo_Category], [hits_eventInfo_Action], [session_id], [user_id], [hits_time], [hits_eventInfo_Label]
FROM [BigQuery]
--OR
SELECT [date], [hits_eventInfo_Category], [hits_eventInfo_Action], [session_id], [user_id], [hits_time], [hits_eventInfo_Label]
FROM [BigQuery]
GROUP BY [date], [hits_eventInfo_Category], [hits_eventInfo_Action], [session_id], [user_id], [hits_time], [hits_eventInfo_Label]
Note: This won't delete your duplicate data, it just won't be displayed in the results of your select statement. If you would like the duplicate entries deleted permenantely, use @singhsac 's response utilizing window functions.
Upvotes: 3
Reputation: 401
You can group the duplicates. Keep one row, and delete the remaining from the duplicate group:
Try this (I've assumed table name and other fields)
;WITH rmvDuplicate
AS (SELECT ROW_NUMBER() OVER (PARTITION BY [date], [hits_eventInfo_Category], [hits_eventInfo_Action], [session_id], [user_id], [hits_time], [hits_eventInfo_Label]
ORDER BY (SELECT 0)) dup
FROM BigQuery_table)
DELETE FROM rmvDuplicate
WHERE dup > 1
Upvotes: 1