Michael
Michael

Reputation: 63

Custom GroupBy Functions for JuliaDB

I'm reading in some stock trade data using the JuliaDB loadtable function and I want to group all dates that are the same and apply a custom function.

The custom function is meant to take the group of trades of the same date and return a single trade with volume being the sum and trade price being the volume weighted average. I can get the sum of the volume but I am struggling to get the volume weighted average.

function vol_weighted_avg(x)
return (x.price .* x.volume) / sum(x.volume)
end

This is the function but it doesn't work with groupby(vol_weighted_avg, trades, :date)

I've spent a long time playing with the groupby and I haven't been able to figure it out nor have I come across documentation that shows how to do something like the above.

Upvotes: 1

Views: 408

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

There are several issues with your code. My answer is for Julia 0.6 and JuliaDB 0.8.4. What you have to understand is that groupby returns a vector of NamedTuples. Therefore the simplest approach to the implementation of your function would be:

vol_weighted_avg(x) =
    sum(getindex.(x, :price) .* getindex.(x,:volume)) / sum(getindex.(x,:volume))

you can use mean from StatsBase to achieve the same:

using StatsBase
vol_weighted_avg2(x) = mean(getindex.(x, :price), Weights(getindex.(x,:volume)))

Unfortunately both versions will not scale well if your data is very large. You could write a custom weighted mean calculation for OnlineStats but in this case I did find it a bit cumbersome. However, in this case it is simple enough to write a function that can be passed to groupreduce like this:

function vol_weighted_avg3(a,b)
    ap, av, bp, bv = a.price, a.volume, b.price, b.volume
    @NT(price=(ap*av + bp*bv)/(av+bv), volume=av+bv)
end

(the only drawback is that you will get a column with totals of volumes traded).

Now let us see the three functions in action:

julia> srand(1);

julia> trades = table(rand(10^7), rand(10^7), rand(1:4, 10^7),
                      names=[:price,:volume,:date], pkey=:date);

julia> groupby(vol_weighted_avg, trades, :date)
Table with 4 rows, 2 columns:
date  vol_weighted_avg
──────────────────────
1     0.500097
2     0.499923
3     0.50013
4     0.499711

julia> groupby(vol_weighted_avg2, trades, :date)
Table with 4 rows, 2 columns:
date  vol_weighted_avg2
───────────────────────
1     0.500097
2     0.499923
3     0.50013
4     0.499711

julia> groupreduce(vol_weighted_avg3, trades, :date)
Table with 4 rows, 3 columns:
date  price     volume
─────────────────────────
1     0.500097  1.2482e6
2     0.499923  1.25025e6
3     0.50013   1.25168e6
4     0.499711  1.249e6

You can check their relative performance to find out that the last option (with groupreduce) is much faster and memory-efficient even for in-memory data:

julia> using BenchmarkTools

julia> @benchmark groupby($vol_weighted_avg, $trades, :date)
BenchmarkTools.Trial:
  memory estimate:  2.16 GiB
  allocs estimate:  105003042
  --------------
  minimum time:     3.301 s (15.41% GC)
  median time:      3.321 s (16.95% GC)
  mean time:        3.321 s (16.95% GC)
  maximum time:     3.341 s (18.47% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> @benchmark groupby($vol_weighted_avg2, $trades, :date)
BenchmarkTools.Trial:
  memory estimate:  1.71 GiB
  allocs estimate:  85003045
  --------------
  minimum time:     2.689 s (15.23% GC)
  median time:      2.748 s (17.09% GC)
  mean time:        2.748 s (17.09% GC)
  maximum time:     2.807 s (18.88% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> @benchmark groupreduce($vol_weighted_avg3, $trades, :date)
BenchmarkTools.Trial:
  memory estimate:  11.50 KiB
  allocs estimate:  254
  --------------
  minimum time:     103.955 ms (0.00% GC)
  median time:      106.777 ms (0.00% GC)
  mean time:        108.810 ms (0.00% GC)
  maximum time:     148.807 ms (0.00% GC)
  --------------
  samples:          47
  evals/sample:     1

Upvotes: 1

Related Questions