Lamak
Lamak

Reputation: 101

Django compare queryset from different databases

I need to compare 2 querysets from the same model from 2 different databases.

I expect the difference between them. In this case, I grab only one column (charfield), from two databases and want to compare this "list", i.e. it would be great to work with sets and difference methods of sets.

But I can't simple subtract querysets, also set(queryset) and list(querysets) -- this give me nothing (not an error), i.e.

diff_set = set(articles1) - set(articles2)

I switched db's on the fly, make 2 querysets and try to compare them (filter, or exclude)

articles1 = list(Smdocuments.objects.using('tmp1').only('id').filter(doctype__exact='CQ'))

# right connection
connections.databases['tmp2']['HOST'] = db2.host
connections.databases['tmp2']['NAME'] = db2.name
articles2 = list(Smdocuments.objects.using('tmp2').only('id').filter(doctype__exact='CQ'))

# okay to chain Smdocuments objects, gives all the entries
all = list(chain(articles1, articles2))

# got nothing, even len(diff_set) is none
diff_set = set(articles1) - set(articles2)

# this one raise error Subqueries aren't allowed across different databases.
articles_exclude = Smdocuments.objects.using('tmp1').only('id').filter(doctype__exact='CQ')
len(articles1)
diff_ex = Smdocuments.objects.using('tmp2').only('id').filter(doctype__exact='CQ').exclude(id__in=articles_exclude)
len(diff_ex)

diff_ex raise an error

Subqueries aren't allowed across different databases. Force the inner query to be evaluated using list(inner_query).

So, "Model objects" not so easy to manipulate, and querysets between difference databases as well.

I see, thats not a good db scheme, but it's another application with distributed db, and I need to compare them.

It's would be enough to compare by one column, but probably compare full queryset will work for future. Or, should I convert queryset to list and compare raw data?

Upvotes: 3

Views: 3982

Answers (1)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

Your question is really unclear about what you actually expect, but here are a couple hints anyway:

First, model instances (assuming they are instances of the same model of course) compare on their primary key value, which is also used as hash for dicts and sets, so if you want to compare the underlying database records you should not work on model instances but on the raw db values as lists of tuples or dicts. You can get those using (resp.) Queryset.values_list() or Queryset.values() - not forgetting to list() them so you really get a list and not a queryset.

Which brings us to the second important point: while presenting themselves as list-likes (in that they support len(), iteration, subscripting and - with some restrictions - slicing), Querysets are NOT lists. You can't compare two querysets (well you can but they compare on identity, which means that two querysets will only be equal if they are actually the very same object), and, more important, using a queryset as an argument to a 'field__in=' lookup will result in a SQL subquery where passing a proper list results in a plain 'field IN (...)' where clause. This explains the error you get with the exclude(...) approach.

To make a long story short, if you want to effectively compare database rows, you want:

# the fields you want to compate records on
fields = 'field1', 'field2', 'fieldN'
rows1 =  list(YouModel.objects.using('tmp1').filter(...).values_list(*fields))
rows2 = list(YouModel.objects.using('tmp2').filter(...).values_list(*fields))

# now you have two lists of tuples so you can apply ordinary python comparisons / set operations etc
print rows1 == rows2
print set(rows1) - set(rows2)
# etc

Upvotes: 3

Related Questions