AfterFray
AfterFray

Reputation: 1851

How can I get the nth largest value in Julia dataframe?

I am looking for a solution to find out nth largest data in my Julia dataframe, something like ,pd.Series.nlargest(n= 5, keep='first') in Python.

In more detail, let's say I have Julia dataframe, such as ;

df = DataFrame(Data1 = rand(5), Data2 = rand(5));

    Data1       Data2
    Float64     Float64
1   0.125824    0.841358
2   0.612905    0.337965
3   0.210736    0.66849
4   0.172203    0.377226
5   0.898269    0.448477

How can I get the nth largest value from column name Data1?

If n =3, below is my expected output.

5   0.898269
2   0.612905
3   0.210736

Upvotes: 3

Views: 1564

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Here is an efficient way to do it. First, to subset rows of a data frame:

julia> df = DataFrame(Data1 = rand(10), Data2 = rand(10));

julia> df[partialsortperm(df.Data1, 1:3, rev=true), :] # if you need a data frame with top 3 rows
3×2 DataFrame
 Row │ Data1     Data2
     │ Float64   Float64
─────┼────────────────────
   1 │ 0.959456  0.628431
   2 │ 0.856696  0.144034
   3 │ 0.824744  0.996384

julia> df[partialsortperm(df.Data1, 3, rev=true), :] # if you need only the 3-rd row
DataFrameRow
 Row │ Data1     Data2
     │ Float64   Float64
─────┼────────────────────
   4 │ 0.824744  0.996384

Both operations are efficient. The partialsort operation does a minimal amount of work to get the resulting the required values.

If you did not want to get all rows of the data frame, but only part of the single column then the following would be enough:

julia> partialsort(df.Data1, 1:3, rev=true) # top 3 values
3-element view(::Vector{Float64}, 1:3) with eltype Float64:
 0.959456038630526
 0.856695598334831
 0.8247444664227905

julia> partialsort(df.Data1, 3, rev=true) # 3-rd value
0.8247444664227905

Upvotes: 5

Related Questions