Incognito
Incognito

Reputation: 145

Hive: Removing Duplicate Rows from Table

I have a table which contains millions of records and all the records have duplicates. So I am trying to extract all the distinct rows in the table. Here's the query I am using:

CREATE TABLE unique_table AS SELECT DISTINCT * FROM duplicates_table;

Is this the efficient way to do this job? Or is there a way to remove duplicate rows without creating a new table?

Upvotes: 1

Views: 126

Answers (1)

leftjoin
leftjoin

Reputation: 38290

You can use the same table:

INSERT OVERWRITE table_name SELECT DISTINCT * FROM table_name;

Upvotes: 2

Related Questions