charlie
charlie

Reputation: 181

Find record whose field 'name' not contained within any other record

I have a model Foo with a String bar and a String name. Some records' bar contain the name of other records in them. This is intentional.

I want to find the "root Foo" records - that is, the ones where their name do not appear in the bar records of any other Foo records.

Example:

Foo
id: 1
name: 'foo1'
bar:  'something something'

id: 2
name: 'foo2'
bar:  'foo1 something'

id: 3
name: 'foo3'
bar:  'foo1, foo4'

My method root_foos would return foo2 and foo3 since their names do not appear in any bar string.

edit: I don't want to use a relation or foreign key here - just this method.

Upvotes: 0

Views: 77

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657777

SELECT f.*
FROM   foo f
WHERE  NOT EXISTS (
   SELECT 1
   FROM   foo f2
   WHERE  f.name <@ string_to_array(f2.bar, ', ')
   );

Replace ', ' with the delimiter you actually use.
This would be much faster with an index. You can build a functional GIN index to work with your current unfortunate design:

CREATE INDEX foo_bar_arr_gin_idx ON foo USING GIN (string_to_array(bar, ', '));

bar should really be implemented as array instead of a string. Then you can simplify both query and index:
Table indexes for Text[] array columns
Why isn't my PostgreSQL array index getting used (Rails 4)?

Normalize schema

As mentioned in the linked answer, the database schema would be much cleaner (queries and indexing simpler and faster) if you implement the n:m relationship properly. Performance depends on many variables.

CREATE TABLE foo
   foo_id serial PRIMARY KEY
 , foo    text
);

CREATE TABLE foo_foo
   foo_id1 int REFERENCES foo
 , foo_id2 int REFERENCES foo
 , PRIMARY KEY (foo_id1, foo_id2) -- provides necessary index automatically
);

Then your data would look like this:

foo
foo_id: 1
foo: 'foo1'

foo_id: 2
foo: 'foo2'

foo_id: 3
foo:   'foo3'


foo_foo
foo_id1: 1
foo_id2: some_id

foo_id1: 1
foo_id2: some_other_id

foo_id1: 2
foo_id2: 1

foo_id1: 2
foo_id2: some_id

foo_id1: 3
foo_id2: 1

foo_id1: 3
foo_id2: 4

And the query:

SELECT f.*
FROM   foo f
WHERE  NOT EXISTS (
   SELECT 1
   FROM   foo_foo f2
   WHERE  f2.foo_id2 = f.foo_id
   );

Upvotes: 2

ctide
ctide

Reputation: 5257

This is going to be horrifically slow with any number of records, and I'd highly recommend restructuring your schema if this is a query you need with any frequency but:

objs = Model.all.to_a
objs.select { |obj| !objs.any? { |inner_obj| inner_obj[:bar].index(obj[:name]) } }

Upvotes: -1

Related Questions