Simon
Simon

Reputation: 1394

GAE datastore inequality filter two properties advice

I've got a scenario where I need to query the datastore for some random users who have been active in the last X minutes.

Each of my User entities have a property called 'random'. When I want to find some random users I generate a random min and max value and use them to query the datastore against the users random property.

This is what I've got so far:

public static List<Entity> getRandomUsers(Key filterKey, String gender, String language, int maxResults) {
    ArrayList<Entity> nonDuplicateEntities = new ArrayList<>();

    HashSet<Entity> hashSet = new HashSet<>();
    int attempts = 0;
    while (nonDuplicateEntities.size() < maxResults) {
        attempts++;
        if (attempts >= 10) {
            return nonDuplicateEntities;
        }

        double ran1 = Math.random();
        double ran2 = Math.random();

        Filter randomMinFilter = new Query.FilterPredicate(Constants.KEY_RANDOM, Query.FilterOperator.GREATER_THAN_OR_EQUAL, Math.min(ran1, ran2));
        Filter randomMaxFilter = new Query.FilterPredicate(Constants.KEY_RANDOM, Query.FilterOperator.LESS_THAN_OR_EQUAL, Math.max(ran1, ran2));
        Filter languageFilter = new Query.FilterPredicate(Constants.KEY_LANGUAGE, Query.FilterOperator.EQUAL, language);

        Filter randomRangeFilter;
        if (gender == null || gender.equals(Constants.GENDER_ANY)) {
            randomRangeFilter = Query.CompositeFilterOperator.and(randomMinFilter, randomMaxFilter, languageFilter);
        } else {
            Filter genderFilter = new Query.FilterPredicate(Constants.KEY_GENDER, Query.FilterOperator.EQUAL, gender);
            randomRangeFilter = Query.CompositeFilterOperator.and(randomMinFilter, randomMaxFilter, genderFilter, languageFilter);
        }

        Query q = new Query(Constants.KEY_USER_CLASS).setFilter(randomRangeFilter);

        PreparedQuery pq = DatastoreServiceFactory.getDatastoreService().prepare(q);

        List<Entity> entities = pq.asList(FetchOptions.Builder.withLimit(maxResults - nonDuplicateEntities.size()));
        for (Entity entity : entities) {
            if (filterKey.equals(entity.getKey())) {
                continue;
            }
            if (hashSet.add(entity)) {
                nonDuplicateEntities.add(entity);
            }
            if (nonDuplicateEntities.size() == maxResults) {
                return nonDuplicateEntities;
            }
        }
    }

    return nonDuplicateEntities;
}

I now need just users who have been active recently.

Each of the User entities also have a 'last active' property, which I want to include in the query e.g. last active > 30 minutes ago.

This would mean having an inequality filter on two properties, which I can't do.

What is the most efficient way to do this?

I could get all user entities active in the last X minutes, and then pick some random ones. I could leave my code as is and do a check for last active before adding them to the non duplicate entity list, but this might involve lots of calls to the datastore.

Is there some other way I can do this just using the query?

Upvotes: 0

Views: 415

Answers (1)

Tim Hoffman
Tim Hoffman

Reputation: 12986

Given the above comments as requested here is one approach.

With the assumption you have a "last active" property which stores a date time stamp you can then perform a keys only query where the last active datetime_stamp > "a datetime stamp of interest".

On retrieving the keys perform a random choice on the result set, then explicitly fetch the key with a get operation. This will limit costs to small ops and a get.

I would consider then caching this set of keys in memcache, with a defined expiry period, so you can re-use the set of keys if you need another random choice in the next nominated period rather than re-querying, 2 secs later. Accuracy doesn't appear to be too important given the random choice.

If you do adopt the caching strategy, you do have to deal with cache expiry and refreshing the cache.

A potential issue here is running into the dogpile effect, where multiple requests all fail to retrieve the cache at the same time and each handler starts building the cache. In a lightly loaded system this may not be an issue, in a heavily loaded system with a lot of activity, you may want to keep the cache active with a task. - Just something to think about.

Upvotes: 3

Related Questions