Reputation: 129
How to remove duplicates within a SELECT
query using Apache fFlink?
and I want to remove duplicates in ID with respect to keeping maximum in range
Upvotes: 2
Views: 988
Reputation: 18987
Assuming that the query is run on a static data set, it can be solved with regular SQL. Since Flink implements standard SQL, this query is not Flink-specific but would run on any relational database system.
SELECT DISTINCT t.id, t.name, t.range
FROM t, (SELECT id, MAX(range) AS maxRange FROM t GROUP BY id) s
WHERE t.id = s.id AND t.range = s.maxRange
Note that you will lose duplicates if there is an id for which there are more than one row with the maximum range.
Upvotes: 1