Kshitij Aggarwal
Kshitij Aggarwal

Reputation: 5287

Selecting random records from appengine data store (java)

Im using GAE with Java(JPA) to retrieve records from my database.

I have more then 2000 records (and growing) at the moment and I have to show 60 totally random records out of those. Currently im implemented it in the following manner

  1. Get all keys from DB in an array
  2. Generate 60 random integers between 0 < index < size(keys_array)
  3. Get these 60 records by key value

The problem im facing in this implementation is that because the whole DB ket set is downloaded on every request, its adding a lot to the 'Datastore Small Operations' (free) qouta, plus it feels inefficient if this code is to scale in future.

Any way to get the random 60 records without downloading all the keys?

Upvotes: 2

Views: 779

Answers (5)

Imran Qamer
Imran Qamer

Reputation: 2265

two options in my mind, first you can cache the keys and it will only take time at first time.

Second you can use random in your query to fetch any random records

Upvotes: 0

Usman Farooq
Usman Farooq

Reputation: 76

Take a constant of current unix timestamp when you first deploy the code. And then with each record, save another column which is current unix timestamp. Now every time while querying random records, generate random unix timestamp between constant timestamp and current timestamp and fetch the records ordered by generated random no.

Upvotes: 1

Nick
Nick

Reputation: 1822

There is a special property applied to entities called __scatter__, which is used by the map reduce framework to sample entities.

It is mentioned in the javadoc here

I'm not really able to find information on how this works/when it is applied, however running a query in the datastore viewer seems to yield results. Not all of the entities seem to have this property however.

You could try just doing the following and see how it pans out:

SELECT * FROM Kind order by __scatter__

Upvotes: 1

Mihail Russu
Mihail Russu

Reputation: 2536

  1. You could memcache all 2000 keys so you don't have to query for them every single request.

  2. You could memcache the 60 random records in a way that every user gets unique and random results that were already shown to previous users (might not be applicable to your case).

None of this is scalable though.

Upvotes: 3

Simon Fischer
Simon Fischer

Reputation: 1196

Depends. Three solutions, none of which may be ideal:

  • If your can use consecutive numbers as keys (you could add a second key if you already have another business id), generate a random key in Java and query by id.
  • Try ORDER BY RAND() and get the first 60 results, but this is not portable since RAND or RANDOM are not part of the JPA spec.
  • Select all records and use EntityManager.setFirstResult (random).setMaxResults (1) to scroll to a random row and repeat 60 times.

Upvotes: 2

Related Questions