Reputation: 11336
I try to find over a 3M table, all the users who have the same username. I read something like this may do the trick.
User.find(:all, :group => [:username], :having => "count(*) > 1" )
However since I'm using Postgres this return me ActiveRecord::StatementInvalid: PG::Error: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function
.
I'm trying something like this
User.select('users.id, users.username').having("count(*) > 1").group('users.username')
But still get the same error. Any idea what I'm doing wrong?
Update: I made it somehow work using User.select('users.*').group('users.id').having('count(users.username) > 1')
but this query returns me this which looks like an empty array even if is founding 5 records.
GroupAggregate (cost=9781143.40..9843673.60 rows=3126510 width=1365)
Filter: (count(username) > 1)
-> Sort (cost=9781143.40..9788959.68 rows=3126510 width=1365)
Sort Key: id
-> Seq Scan on users (cost=0.00..146751.10 rows=3126510 width=1365)
(5 rows)
=> []
Any idea why this is happening and how to get those 5 rows?
Upvotes: 4
Views: 5690
Reputation: 972
"group by" in database collapses each group into one row in output. Most likely what you are intending will be produced by the following query:
User.where("name in (select name from users group by name having count(*)>1)").order(:name)
The inner query above finds all names that appear more than once. Then we find all rows with these names. Ordering by name will make your further processing easier. To speedup, add index to column name in users table.
There are alternate Postgres specific ways to solve this, however the above will work across all databases.
Upvotes: 0
Reputation: 10593
I think the best you could get is to get usernames for duplicate records. That can be achieved with
User.select(:username).group(:username).having('COUNT(username) > 1')
Upvotes: 5