Jonathan
Jonathan

Reputation: 2233

Datanucleus creates subquery instead of join

I have these annotations:

public class Account {
@Persistent(defaultFetchGroup = "true", dependent = "false")
@Column(name = "user_owner_id")
private User user;
} 

public class User {
@Persistent(defaultFetchGroup = "true", mappedBy = "user")
@Element(column = "user_owner_id", dependent = "true")
private Set<Account>      accounts;
}

When initiating the Account class, the query on the database use a SELECT * FROM accounts where exists (SELECT id from users where id=23)

I am trying to give datanucleus an annotation that tells it to run on the database SELECT a.* FROM accounts a JOIN users u on a.id = u.user_id where u.id = 23 as this is more optimal.

So which annotation should I use to make data nucleus change its query formation?

--- addition ----

This is a stripped down version of how we're retrieving the data:

PersistenceManager persistenceManager = persistenceManagerFactory.getPersistenceManager();
      persistenceManager.getFetchPlan().setMaxFetchDepth(FetchPlan.FETCH_SIZE_GREEDY);

Query query = persistenceManager.newQuery("javax.jdo.query.JDOQL", null);

query.setClass(User.class);
query.setFilter("this.uuid==p1");
query.declareParameters("java.lang.String p1");

final List<E> entities = (List<E>) query.execute(uuid);

E entity = entities.iterator().next();

return persistenceManager.detachCopy(entity);

Upvotes: 0

Views: 220

Answers (1)

Neil Stockton
Neil Stockton

Reputation: 11531

You are performing a Query just to get one object, which is very inefficient. Instead you could easily do

User u = pm.getObjectById(User.class, 1);

and this would likely issues 2 SQLs in total; 1 to get the basic User object, and 1 to get the Accounts connected to that User. There would be no EXISTS clause.

With regards to what you are actually doing. A Query is issued. A Query is general and in most use-cases will return multiple objects. The filter clause of the query can be complex. The Query is converted into an SQL to get the basic User fields. It can't get the related objects in a single call, so your log will likely say something about BULK FETCH (or something similar, whatever DataNucleus calls it). This will have an EXISTS clause with the EXISTS subquery restricting the second query to the objects the Query applies to). They do this to avoid the N+1 problem. Using EXISTS is, in general, the most appropriate for the general case of a query. In your specific situation it would have been nice to have an INNER JOIN, but I don't think that is supported as a BULK FETCH option currently. Their code is open source and I know they have asked people to contribute things in the past where they want alternative handling ... so you could contribute something if you want to use a query in this precise situation.

Upvotes: 2

Related Questions