Reputation: 99
The below is the sample dataframe and I want to extract the value bw
of Rat as Float64
not as Vector{Float64}
.
df = DataFrame(id=["Mouse","Rat"],
time=[1,1],
bw=[25.0,100.45])
I get a Vector{Float64} when I use the below code.
df[in.(df.id, Ref(["Rat"])), :bw]
Can yo please tell me how can I get only the value of bw=100.45 from the dataframe. I don't want to use the below code, instead I want to reference id=="Rat" and take it that way to be sure I am getting the correct value from a larger dataset.
df[2, :bw]
Thanks again...!
Upvotes: 2
Views: 134
Reputation: 12654
It seems to me like the simplest and most direct solution is to use findfirst
:
julia> findfirst(==("Rat"), df.id)
2
Then the complete solution is
julia> df[findfirst(==("Rat"), df.id), :bw]
100.45
It is also the fastest solution so far, unless the keys are pre-sorted.
Upvotes: 1
Reputation: 308
If you don't want to use only
as other's have suggested, you can do:
julia> df[in.(df.id, Ref(["Rat"])), :bw][1]
# 100.45
Which I think is a bit more legible although the performance looks almost identical:
julia> @benchmark df[in.(df.id, Ref(["Rat"])), :bw][1]
BenchmarkTools.Trial: 10000 samples with 210 evaluations.
Range (min … max): 358.929 ns … 23.962 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 369.643 ns ┊ GC (median): 0.00%
Time (mean ± σ): 396.594 ns ± 523.519 ns ┊ GC (mean ± σ): 4.35% ± 3.61%
█
▂██▇▄▄▃▄▄▄▃▃▃▄▆▆▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂ ▃
359 ns Histogram: frequency by time 472 ns <
Memory estimate: 320 bytes, allocs estimate: 8.
julia> @benchmark only(df[in.(df.id, Ref(["Rat"])), :bw])
BenchmarkTools.Trial: 10000 samples with 212 evaluations.
Range (min … max): 354.557 ns … 17.193 μs ┊ GC (min … max): 0.00% … 97.69%
Time (median): 366.354 ns ┊ GC (median): 0.00%
Time (mean ± σ): 395.770 ns ± 593.797 ns ┊ GC (mean ± σ): 5.57% ± 3.64%
█▅
▄██▆▃▃▄▃▃▃▃▃▆▆▄▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
355 ns Histogram: frequency by time 481 ns <
Memory estimate: 320 bytes, allocs estimate: 8.
Upvotes: 2
Reputation: 18217
If the :id
field is the 'primary key of df
, it might be worth sorting df
by this field and using binary search for quick access. For example:
julia> sort!(df, :id)
2×3 DataFrame
Row │ id time bw
│ String Int64 Float64
─────┼────────────────────────
1 │ Mouse 1 25.0
2 │ Rat 1 100.45
julia> df[searchsortedfirst(df.id, "Rat"),:bw]
100.45
Upvotes: 1
Reputation: 13800
You can do:
julia> only(df[in.(df.id, Ref(["Rat"])), :bw])
100.45
The reason why you are getting a vector back is that you are indexing with a vector. This is consistent with base Julia:
julia> x = ["a", "b"]
2-element Vector{String}:
"a"
"b"
julia> x[[true, false]]
1-element Vector{String}:
"a"
So another option is to index with a scalar:
julia> df[findfirst(in.(df.id, Ref(["Rat"]))), :bw]
100.45
here the findfirst
returns the index of the first true
element of your indexing vector, which is a scalar.
Upvotes: 2