Reputation: 5449
I'm trying basic data analysis with Julia
I'm following this tutorial with the train datasets that can be found here (the one named train_u6lujuX_CVtuZ9i.csv
) with the following code:
using DataFrames, RDatasets, CSV, StatsBase
train = CSV.read("/Path/to/train_u6lujuX_CVtuZ9i.csv");
describe(train[:LoanAmount])
and get this output:
Summary Stats:
Length: 614
Type: Union{Missing, Int64}
Number Unique: 204
instead of the output of the tutorial:
Summary Stats:
Mean: 146.412162
Minimum: 9.000000
1st Quartile: 100.000000
Median: 128.000000
3rd Quartile: 168.000000
Maximum: 700.000000
Length: 592
Type: Int64
% Missing: 3.583062
Which also corresponds to the output of StatsBase.jl that the describe()
function should give
Upvotes: 2
Views: 4660
Reputation: 69819
This is how it is currently (in the current release) implemented in StatsBase.jl. In short train.LoanAmount
does not have eltype
that is subtype of Real
and then StatsBase.jl uses a fallback method that only prints length, eltype and number of unique values. You can write describe(collect(skipmissing(train.LoanAmount)))
to get summary statistics (except number of missings of course).
Actually, however, I would recommend you to use another approach. If you want to get a more verbose output on a single column use:
describe(train, :all, cols=:LoanAmount)
you will get an output that additionally is returned as a DataFrame
so that you can not only see the statistics but also access them.
Option :all
will print all statistics please refer to describe
docstring in DataFrames.jl to see available options.
You can find some examples of using this function on a current release of DataFrames.jl here.
Upvotes: 6