Dan Friedman
Dan Friedman

Reputation: 5228

How can I replicate 'PARTITION BY' on DocumentDb

I need to find the record for the last time each person logged in. This would work with T-SQL

SELECT *
FROM
(
  SELECT lh.*, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY DateCreated DESC) AS RowNumber
  FROM LoginHistory lh
) lhp
where lhp.RowNumber = 1

But due to DocumentDb's reduced feature set, I can't figure out how to handle this.

I assume this would need to be solved as a Stored Procedure, but I can't figure how to structure that either. Loop over multiple async calls? (Not sure how that would even be done.) Download all the records and filter them just using JS?

How can I translate this?

UPDATE: Sample output

PARTITION BY is similar to a GROUP BY, but instead of aggregating the results, it treats the records as a kind of scope. So

  SELECT lh.*, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY DateCreated DESC) AS RowNumber
  FROM LoginHistory lh

would return something like

UserId    DateCreated            RowNumber
1         2015-12-10 22:44:03    1
1         2015-12-10 13:35:12    2
1         2015-12-09 18:52:25    3
2         2015-12-10 20:53:13    1
2         2015-12-10 08:12:41    2

It basically says, "For a given UserId, order those records by DateCreated".

Then I just select RowNumber = 1 in the outer query and I have the latest record for each user.

My question used the SQL 2015+ syntax, but it is also possible with pre-SQL 2005 syntax, which would be done with something like this:

Select Date, User, Status, Notes 
from [SOMETABLE]
inner join 
(
    Select max(Date) as LatestDate, [User]
    from [SOMETABLE]
    Group by User
) SubMax 
on [SOMETABLE].Date = SubMax.LatestDate
and [SOMETABLE].User = SubMax.User 

Unfortunately, DocumentDb doesn't support GroupBy either.

Upvotes: 1

Views: 932

Answers (1)

Larry Maccherone
Larry Maccherone

Reputation: 9533

It looks like you want the most recent value for a given UserId. I can think of three alternative ways to do this and you suggested the second one:

  1. Fetch everything back to the client and pluck the rows you want out. Could be a bandwidth hog and have latency problems depending upon the size of your data.

  2. Write a Stored Procedure. If this is an operation that you want to do a lot and performance is critical, you might want to take this approach. I was able to quickly adapt my countDocuments sproc to create the sproc you want. You can find it here (UPDATE: Added tests and fixed bug from when I first posted). If you are on .NET, you'll need to precompile that sproc (written in CoffeeScript) and send it to your server. If you are on node.js, I recommend documentdb-utils.loadSprocs to load all sprocs from a particular directory. loadSprocs takes care of compiling and even allows you to modularize and use npm packages inside your sprocs by implementing require() support.

  3. If you expect to do other aggregations or partitions, then I recommend a more general solution, documentdb-lumenize, which is an aggregation "library" running in a stored procedure within DocumentDB. You call the stored procedure with a configuration to specify the "aggregation" (or partition) you want to do. The rest of this answer is a description on how to do this with documentdb-lumenize.

The firstValue (if you ORDER BY DESC) or lastValue (if you use default ASC) aggregation function is what you want.

I'm going to give the answer assuming that you are using node.js/CoffeeScript but you can do the equivalent with JavaScript and/or .NET. I can work up a complete example for you in whichever you prefer. Just let me know.

First step is your query:

filterQuery = "SELECT c.ValueToReturn FROM c ORDER BY c.DateCreated"

Next, you'll define the field that you wish to "partition by"

dimensions = [{field: "UserId"}]

Then, you'll need to define the "metrics".

metrics = [{field: 'ValueToReturn', f: "lastValue", as: "Last Value"}]

Lastly, you bundle that all up into a config object

cubeConfig = {dimensions, metrics}
config = {cubeConfig, filterQuery}

Then, send the config in as the only parameter when you call the "cube" stored procedure. See the documentdb-lumenize docs for examples on how to get the cube to your collection and execute it (CoffeeScript/JavaScript/.NET examples provided).

I have a browser-based implementation of Lumenize that allows me to give you a JSFiddle working example. The only difference is that I don't specify the filterQuery for the browser config. Since documentdb-lumenize works within DocumentDB, it needs this extra information that is not needed browser-based. It's also slightly different from the example above because I stuck with the DESC sort order from your example data that I just cut and paste from. The one you want there is "firstValue". If the sort order is ASC, then you want "lastValue" which is what I show above.

Note, the above example will only return the one field (shown here as "ValueToReturn". If you want more than that, you'll need to make sure they are in your SELECT clause (or just say "*") and you'll need a row for each field in the metrics table. You can build the metrics table programmatically so it'll only be a few lines of code even if you need 50 fields. Let me know if you want a CoffeeScript/JavaScript example of how to do this programmatically.

Upvotes: 1

Related Questions