HiveQL: how to remove duplicate rows based on two columns

Question

I am creating a table of undirected graph like below.

+-------------------+------------------------+----------------------+
|     id            |     node_a             |        node_b        |
+-------------------+------------------------+----------------------+
|     1             |         a              |           b          |
+-------------------+------------------------+----------------------+
|     2             |         a              |           c          |
+-------------------+------------------------+----------------------+
|     3             |         a              |           d          |
+-------------------+------------------------+----------------------+
|     4             |         b              |           a          |
+-------------------+------------------------+----------------------+
|     5             |         b              |           c          |
+-------------------+------------------------+----------------------+
...

row id=1 and id=4 are duplicate rows and one shall be deleted. What would be an efficient way to remove all duplicate rows in this table?

HiveQL: how to remove duplicate rows based on two columns

Answers (1)

Related Questions