Reputation: 4036
Given a table, I wish to remove rows where a column value is distinct.
So if we want to do this for column2 on matrix A
c1 c2 c3
A = |1 2 4 |
|1 2 5 |
|1 1 6 |
yields
c1 c2 c3
A = |1 2 4 |
|1 2 5 |
This can be done easily by
SELECT * FROM Table WHERE c2 IN
(SELECT c2 FROM Table GROUP BY c2 HAVING COUNT(*) > 1)
Unfortunately, in the middle of a subquery, you don't have the data stored in a Table and I don't want to create a view as I need to do this filtering all in one query.
Any ideas on how I can still filter out distinct rows with respect to a single column without referencing a Table in the subquery?
The solution should be of the form:
SELECT <something goes here>
FROM <the subquery which outputs A goes here>
<anything you want here that is legal Bigquery - e.g. can't reference A>
and there is no table to reference.
Upvotes: 1
Views: 1277
Reputation: 1269633
Bigquery supports window functions, so you might be able to do this:
select t.*
from (select t.*, count(*) over (partition by col2) as cnt
from table t
) t
where cnt >= 2;
This still references the table in a subquery, but that is the only reference to the table.
Upvotes: 4