Parsshava Mehta
Parsshava Mehta

Reputation: 99

Extract value from DataFrame as Float64 value not as Vector

The below is the sample dataframe and I want to extract the value bw of Rat as Float64 not as Vector{Float64}.

df = DataFrame(id=["Mouse","Rat"],
               time=[1,1],
               bw=[25.0,100.45]) 

I get a Vector{Float64} when I use the below code.

df[in.(df.id, Ref(["Rat"])), :bw]

Can yo please tell me how can I get only the value of bw=100.45 from the dataframe. I don't want to use the below code, instead I want to reference id=="Rat" and take it that way to be sure I am getting the correct value from a larger dataset.

df[2, :bw]

Thanks again...!

Upvotes: 2

Views: 134

Answers (4)

DNF
DNF

Reputation: 12654

It seems to me like the simplest and most direct solution is to use findfirst:

julia> findfirst(==("Rat"), df.id)
2

Then the complete solution is

julia> df[findfirst(==("Rat"), df.id), :bw]
100.45

It is also the fastest solution so far, unless the keys are pre-sorted.

Upvotes: 1

Kai Lukowiak
Kai Lukowiak

Reputation: 308

If you don't want to use only as other's have suggested, you can do:

julia> df[in.(df.id, Ref(["Rat"])), :bw][1]
# 100.45

Which I think is a bit more legible although the performance looks almost identical:

julia> @benchmark df[in.(df.id, Ref(["Rat"])), :bw][1]
BenchmarkTools.Trial: 10000 samples with 210 evaluations.
 Range (min … max):  358.929 ns …  23.962 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     369.643 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   396.594 ns ± 523.519 ns  ┊ GC (mean ± σ):  4.35% ± 3.61%

    █
  ▂██▇▄▄▃▄▄▄▃▃▃▄▆▆▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂ ▃
  359 ns           Histogram: frequency by time          472 ns <

 Memory estimate: 320 bytes, allocs estimate: 8.

julia> @benchmark only(df[in.(df.id, Ref(["Rat"])), :bw])
BenchmarkTools.Trial: 10000 samples with 212 evaluations.
 Range (min … max):  354.557 ns …  17.193 μs  ┊ GC (min … max): 0.00% … 97.69%
 Time  (median):     366.354 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   395.770 ns ± 593.797 ns  ┊ GC (mean ± σ):  5.57% ±  3.64%

   █▅
  ▄██▆▃▃▄▃▃▃▃▃▆▆▄▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  355 ns           Histogram: frequency by time          481 ns <

 Memory estimate: 320 bytes, allocs estimate: 8.

Upvotes: 2

Dan Getz
Dan Getz

Reputation: 18217

If the :id field is the 'primary key of df, it might be worth sorting df by this field and using binary search for quick access. For example:

julia> sort!(df, :id)
2×3 DataFrame
 Row │ id      time   bw      
     │ String  Int64  Float64 
─────┼────────────────────────
   1 │ Mouse       1    25.0
   2 │ Rat         1   100.45

julia> df[searchsortedfirst(df.id, "Rat"),:bw]
100.45

Upvotes: 1

Nils Gudat
Nils Gudat

Reputation: 13800

You can do:

julia> only(df[in.(df.id, Ref(["Rat"])), :bw])
100.45

The reason why you are getting a vector back is that you are indexing with a vector. This is consistent with base Julia:

julia> x = ["a", "b"]
2-element Vector{String}:
 "a"
 "b"

julia> x[[true, false]]
1-element Vector{String}:
 "a"

So another option is to index with a scalar:

julia> df[findfirst(in.(df.id, Ref(["Rat"]))), :bw]
100.45

here the findfirst returns the index of the first true element of your indexing vector, which is a scalar.

Upvotes: 2

Related Questions