Reputation: 71
In Julia, I want to test the normality of a variable for each group defined in another column in a dataframe.
Lets say we have:
df = DataFrame(x = rand(Normal(),30), group = repeat(["A", "B"],15))
I know I can test the normality of x with :
using HypothesisTests
using Distributions
OneSampleADTest(x, Normal())
So the question is how do I test the normality of x for each group ? In R, I would use tapply() but I couldn't find the equivalent in Julia...
Upvotes: 2
Views: 219
Reputation: 14735
If you want to just get the pvalue for each group in the data frame,
julia> combine(groupby(df, :group), :x => (x -> pvalue(OneSampleADTest(x, Normal()))) => :onesampleAD_pvalue)
2×2 DataFrame
Row │ group onesampleAD_pvalue
│ String Float64
─────┼────────────────────────────
1 │ A 0.275653
2 │ B 0.544317
If you want to print the test details (or do more complex manipulations) per group, you can instead loop over the groups too:
julia> for (key, sdf) in pairs(groupby(df, :group))
println("Group $(key.group)")
display(OneSampleADTest(sdf.x, Normal()))
end
Group A
One sample Anderson-Darling test
--------------------------------
...
Group B
One sample Anderson-Darling test
--------------------------------
...
Upvotes: 2
Reputation: 69949
It depends what output you expect. I recommend that you store the result in a data frame (this is not what tapply
does):
julia> gdf = groupby(df, :group, sort=true) # group by :group and keep groups sorted
GroupedDataFrame with 2 groups based on key: group
First Group (15 rows): group = "A"
Row │ x group
│ Float64 String
─────┼───────────────────
1 │ -0.869008 A
2 │ 0.190041 A
3 │ 0.369881 A
4 │ 0.445092 A
⋮ │ ⋮ ⋮
13 │ -0.599266 A
14 │ 0.696132 A
15 │ 0.788465 A
8 rows omitted
⋮
Last Group (15 rows): group = "B"
Row │ x group
│ Float64 String
─────┼───────────────────
1 │ -1.19973 B
2 │ 0.557241 B
3 │ -0.425667 B
4 │ 0.787917 B
⋮ │ ⋮ ⋮
13 │ 1.96912 B
14 │ 0.567594 B
15 │ 1.39739 B
8 rows omitted
julia> res = combine(gdf, :x => (x -> OneSampleADTest(x, Normal())) => :ADTest)
2×2 DataFrame
Row │ group ADTest
│ String OneSampl…
─────┼───────────────────────────────────────────
1 │ A One sample Anderson-Darling test…
2 │ B One sample Anderson-Darling test…
Now in res
you have both group name and the result of the test (a full test-result object that you can work with later).
If you are interested only in p-value do:
julia> res = combine(gdf, :x => (x -> pvalue(OneSampleADTest(x, Normal()))) => :ADTest_pvalue)
2×2 DataFrame
Row │ group ADTest_pvalue
│ String Float64
─────┼───────────────────────
1 │ A 0.469626
2 │ B 0.750134
If you are used to dplyr style use DataFramesMeta.jl:
julia> using DataFramesMeta
julia> @combine(gdf, :ADTest = OneSampleADTest(:x, Normal()))
2×2 DataFrame
Row │ group ADTest
│ String OneSampl…
─────┼───────────────────────────────────────────
1 │ A One sample Anderson-Darling test…
2 │ B One sample Anderson-Darling test…
julia> @combine(gdf, :ADTest_pvalue = pvalue(OneSampleADTest(:x, Normal())))
2×2 DataFrame
Row │ group ADTest_pvalue
│ String Float64
─────┼───────────────────────
1 │ A 0.469626
2 │ B 0.750134
Upvotes: 3