Reputation: 31
I've been using the following code to generate histograms from binning one column of a Dataframe and using that bin to calculate a median from another column.
using Plots, Statistics, DataFrames
df = DataFrame(x=repeat(["a","b","c"], 5), y=1:15)
res = combine(groupby(df, :x, sort=true), :y => median)
bar(res.x, res.y_median, legend=false)
The code point selects values for the bins and I would like to apply a bin range of values manually, if possible?
Row │ A B_median
│ Any Float64
─────┼───────────────────
1 │ 1515.74 0.09
2 │ 1517.7 0.81
3 │ 1527.22 10.23
4 │ 1529.88 2.95
5 │ 1530.72 17.32
6 │ 1530.86 15.22
7 │ 1532.26 1.45
8 │ 1532.68 18.51
9 │ 1541.08 1.32
10 │ 1541.22 15.78
11 │ 1541.36 0.12
12 │ 1541.5 13.55
13 │ 1541.92 11.99
14 │ 1542.06 21.14
15 │ 1542.34 10.645
16 │ 1542.62 19.95
17 │ 1542.76 21.0
18 │ 1543.32 20.91
For example, instead of calculating a median for rows 9->17 individually. Could these rows be bunched together automatically i.e. 1542.7+/-0.7 and a total median value be calculated for this range? Many thanks!
Upvotes: 2
Views: 158
Reputation: 69939
I assume you want something like this:
julia> using DataFrames, CategoricalArrays, Random, Statistics
julia> Random.seed!(1234);
julia> df = DataFrame(A=rand(20), B=rand(20));
julia> df.A_group = cut(df.A, 4);
julia> res = combine(groupby(df, :A_group, sort=true), :B => median)
4×2 DataFrame
Row │ A_group B_median
│ Cat… Float64
─────┼─────────────────────────────────────────────
1 │ Q1: [0.014908849285099945, 0.532… 0.134685
2 │ Q2: [0.5323651749779272, 0.65860… 0.347995
3 │ Q3: [0.6586057536399257, 0.81493… 0.501756
4 │ Q4: [0.8149335702852593, 0.97213… 0.531899
Upvotes: 2