Reputation: 16004
I am trying to use the aggregate
function to compute the mean of a variable by group
using Distributions, PooledArrays
N=Int64(2e9/8); K=100;
pool = [@sprintf "id%03d" k for k in 1:K]
pool1 = [@sprintf "id%010d" k for k in 1:(N/K)]
function randstrarray(pool, N)
PooledArray(PooledArrays.RefArray(rand(UInt8(1):UInt8(K), N)), pool)
end
using JuliaDB
DT = IndexedTable(Columns([1:N;]), Columns(
id1 = randstrarray(pool, N),
v3 = rand(round.(rand(Uniform(0,100),100),4), N) # numeric e.g. 23.5749
));
res = IndexedTables.aggregate(mean, DT, by=(:id1,), with=:v3)
How I get the error
MethodError: no method matching mean(::Float64, ::Float64)
Closest candidates are:
mean(!Matched::Union{Function, Type}, ::Any) at statistics.jl:19
mean(!Matched::AbstractArray{T,N} where N, ::Any) where T at statistics.jl:57
mean(::Any) at statistics.jl:34
in at base\<missing>
in #aggregate#144 at IndexedTables\src\query.jl:119
in aggregate_to at IndexedTables\src\query.jl:148
however
IndexedTables.aggregate(+ , DT, by=(:id1,), with=:v3)
works fine
Upvotes: 0
Views: 1315
Reputation: 2260
Edit:
res = IndexedTables.aggregate_vec(mean, DT, by=(:id1,), with=:v3)
from help:
help?> IndexedTables.aggregate_vec
aggregate_vec(f::Function, x::IndexedTable) Combine adjacent rows with equal indices using a function from vector to scalar, e.g. mean.
Old answer:
(I keep it because it was pleasant exercise (for me) how to create helper type and functions if something doesn't work like we want. Maybe it could help someone in future :)
I am not sure how do you like to aggregate mean. My idea is to calculate "center of gravity" for points with equivalent mass.
center of two points: G = (A+B)/2
adding (aggregating) third point C is (2G+C)/3 (2G because G's mass is A's mass +B's mass)
etc.
struct Atractor
center::Float64
mass::Int64
end
" two points create new atractor with double mass "
mediocre(a::Float64, b::Float64) = Atractor((a+b)/2, 2)
# pls forgive me function's name! :)
" aggregate new point to atractor "
function mediocre(a::Atractor, b::Float64)
mass = a.mass + 1
Atractor((a.center*a.mass+b)/mass, mass)
end
Test:
tst_array = rand(Float64, 100);
isapprox(mean(tst_array), reduce(mediocre, tst_array).center)
true # at least in my tests! :)
mean(tst_array) == reduce(mediocre, tst_array).center # sometimes true
For aggregate function we need a little more work:
import Base.convert
" we need method for convert Atractor to Float64 because aggregate
function wants to store result in Float64 "
convert(Float64, x::Atractor) = x.center
And now it (probably :P) works
res = IndexedTables.aggregate(mediocre, DT, by=(:id1,), with=:v3)
id1 │
────────┼────────
"id001" │ 45.9404
"id002" │ 47.0032
"id003" │ 46.0846
"id004" │ 47.2567
...
I hope you see that aggregating mean has impact to precision! (there is more sum and divide operations)
Upvotes: 1
Reputation: 19132
You need to tell it how to reduce two numbers to one. mean
is for arrays. So just use an anonymous function:
res = IndexedTables.aggregate((x,y)->(x+y)/2, DT, by=(:id1,), with=:v3)
Upvotes: 1
Reputation: 26259
I'd really like to help you, but it took me 10 minutes to install all the packages and another few minutes to run the code and figuring out what it actually does (or doesn't). It would be great if you'd provide a "minimal working example", which focusses on the problem. In fact, the only requirement to reproduce your problem is seemingly IndexedTables
and two random arrays.
(Sorry, this is not a complete answer, but too long to be a comment.)
Anyways, if you read the docstring of IndexedTables.aggregate
, you see that it requires a function which takes two arguments and obviously returns a single value::
help?> IndexedTables.aggregate
aggregate(f::Function, arr::IndexedTable)
Combine adjacent rows with equal indices using the given 2-argument
reduction function, returning the result in a new array.
You see in the error message you posted, that there is
no method matching mean(::Float64, ::Float64)
Since I don't know what you expect to be calculated, I now assume that you want to calculate the mean
value of the two numbers. In this case you can define another method for mean()
:
Base.mean(x, y) = (x+y) / 2
This will fulfil the aggregate
function signature requirements. But I am not sure if this is what you want.
Upvotes: 0