buzaku
buzaku

Reputation: 361

Add column conditionally

I am trying to add a new column whose values are conditional on other columns.

using DataFrames, DataFramesMeta

df = DataFrame(a = 1:10, 
               b = StatsBase.sample([0, 1], 10, replace = true), 
               c = StatsBase.sample([0, 1], 10, replace = true), 
               d = StatsBase.sample([0, 1], 10, replace = true))
@linq df |>
    transform(e = ifelse.(:b == 1 || :c == 1 || :d == 1, 1, 0))

But this does not evaluate properly:

    a   b   c   d   e
1   1   0   1   1   0
2   2   1   0   1   0
3   3   0   0   0   0
4   4   1   1   0   0
5   5   1   0   0   0
6   6   0   1   0   0
7   7   0   0   0   0
8   8   1   0   1   0
9   9   1   0   1   0
10  10  0   1   1   0

Where is the condition wrong?

Upvotes: 2

Views: 813

Answers (2)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Here is how you can do it (I use rand from Base to generate data as it is enough in this case):

using DataFrames, DataFramesMeta

df = DataFrame(a = 1:10, b = rand([0, 1], 10),
               c = rand([0, 1], 10), d = rand([0, 1], 10))
@linq df |>
    transform(e = Int.((:b .== 1) .| (:c .== 1) .| (:d .== 1)))
@linq df |>
    transform(e = ifelse.((:b .== 1) .| (:c .== 1) .| (:d .== 1), "yes", "no"))

The issue is that you have to broadcast the operations inside transform as :b == 1 etc. will be always false.

I also show that in this case you can simply cast the result to integer using Int and ifelse is useful if you would want some general values.

Actually in this case probably @byrow! is simpler:

@byrow! df begin
    @newcol e::Vector{Int}
    :e = :b == 1 || :c == 1 || :d == 1 ? 1 : 0
end

EDIT. Under DataFramesMeta.jl 0.10 it would be:

julia> using DataFrames, DataFramesMeta

julia> df = DataFrame(a = 1:10, b = rand([0, 1], 10),
                      c = rand([0, 1], 10), d = rand([0, 1], 10))
10×4 DataFrame
 Row │ a      b      c      d
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      0      1      0
   2 │     2      0      0      0
   3 │     3      0      1      1
   4 │     4      0      1      0
   5 │     5      1      1      1
   6 │     6      1      1      1
   7 │     7      0      1      1
   8 │     8      0      0      0
   9 │     9      1      0      0
  10 │    10      0      1      0

julia> @rtransform(df, :e = Int(:b == 1 || :c == 1 || :d == 1))
10×5 DataFrame
 Row │ a      b      c      d      e
     │ Int64  Int64  Int64  Int64  Int64
─────┼───────────────────────────────────
   1 │     1      0      1      0      1
   2 │     2      0      0      0      0
   3 │     3      0      1      1      1
   4 │     4      0      1      0      1
   5 │     5      1      1      1      1
   6 │     6      1      1      1      1
   7 │     7      0      1      1      1
   8 │     8      0      0      0      0
   9 │     9      1      0      0      1
  10 │    10      0      1      0      1

Upvotes: 3

fc9.30
fc9.30

Reputation: 2591

Maybe not an elegant but simple solution:

df[df.a .< 5, :b] .= 2

Upvotes: 1

Related Questions