Reputation: 551
I have a dataframe df
and I am trying to apply a function to each of the cells. According to the documentation I should use the transform
function.
The function should be applied to each column so I use [:]
as a selector for all columns
transform(
df, [:] .=> ByRow(x -> (if (x > 1) x else zero(Float64) end)) .=> [:]
)
but it yields an exception
ArgumentError: Unrecognized column selector: Colon() => (DataFrames.ByRow{Main.workspace293.var"#1#2"}(Main.workspace293.var"#1#2"()) => Colon())
although when I am using a single column, it works fine
transform(
df, [:K0] .=> ByRow(x -> (if (x > 1) x else zero(Float64) end)) .=> [:K0]
)
Upvotes: 2
Views: 867
Reputation: 69819
The simplest way to do it is to use broadcasting:
julia> df = DataFrame(2*rand(4,3), [:x1, :x2, :x3])
4×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼───────────┼──────────┼──────────┤
│ 1 │ 0.945879 │ 1.59742 │ 0.882428 │
│ 2 │ 0.0963367 │ 0.400404 │ 0.599865 │
│ 3 │ 1.23356 │ 0.807691 │ 0.547917 │
│ 4 │ 0.756098 │ 0.595673 │ 0.29678 │
julia> @. ifelse(df > 1, df, 0.0)
4×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼─────────┼─────────┤
│ 1 │ 0.0 │ 1.59742 │ 0.0 │
│ 2 │ 0.0 │ 0.0 │ 0.0 │
│ 3 │ 1.23356 │ 0.0 │ 0.0 │
│ 4 │ 0.0 │ 0.0 │ 0.0 │
you can also transform
for it if you prefer:
julia> transform(df, names(df) .=> ByRow(x -> ifelse(x>1, x, 0.0)) .=> names(df))
4×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼─────────┼─────────┼─────────┤
│ 1 │ 0.0 │ 1.59742 │ 0.0 │
│ 2 │ 0.0 │ 0.0 │ 0.0 │
│ 3 │ 1.23356 │ 0.0 │ 0.0 │
│ 4 │ 0.0 │ 0.0 │ 0.0 │
Also looking at the linked pandas solution DataFrames.jl seems faster in this case:
julia> df = DataFrame(2*rand(2,3), [:x1, :x2, :x3])
2×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼────────────────────────────
1 │ 1.48781 1.20332 1.08071
2 │ 1.55462 1.66393 0.363993
julia> using BenchmarkTools
julia> @btime @. ifelse($df > 1, $df, 0.0)
6.252 μs (58 allocations: 3.89 KiB)
2×3 DataFrame
Row │ x1 x2 x3
│ Float64 Float64 Float64
─────┼───────────────────────────
1 │ 1.48781 1.20332 1.08071
2 │ 1.55462 1.66393 0.0
(in pandas for 2x3 data frame it was ranging from 163 µs to 2.26 ms)
Upvotes: 7