KT_1
KT_1

Reputation: 8474

New column in R using if statement

For a sample dataframe:

df2<- structure(list(region = c("AT22", "AT13", "AT12", "AT11", "AT33", 
                                "AT31", "AT21", "AT34", "AT32", "BE21", "BE10", "BE24", "BE31", 
                                "BE25", "BE23", "BE32", "BE33", "BE22", "BE34", "BE35"), N = c(241L, 
                                                                                               346L, 306L, 55L, 139L, 311L, 107L, 79L, 119L, 244L, 143L, 146L, 
                                                                                               59L, 212L, 203L, 223L, 173L, 147L, 54L, 75L), freq.1 = c(62L, 
                                                                                                                                                        104L, 64L, 20L, 24L, 78L, 23L, 10L, 20L, 65L, 24L, 29L, 9L, 46L, 
                                                                                                                                                        51L, 74L, 36L, 33L, 14L, 16L), result = c(24.95, 29.97, 21.1, 
                                                                                                                                                                                                  36.27, 18.38, 24.8, 21.28, 12.54, 17.21, 26.64, 16.78, 19.86, 
                                                                                                                                                                                                  15.25, 21.7, 25.12, 33.18, 20.81, 22.45, 25.93, 21.33), level = c(2, 
                                                                                                                                                                                                                                                                    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), delete = c(1, 
                                                                                                                                                                                                                                                                                                                                         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("region", 
                                                                                                                                                                                                                                                                                                                                                                                                               "N", "freq.1", "result", "level", "delete"), class = c("data.table", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "data.frame"), row.names = c(NA, -20L))

I want to create a variable called 'delete' which highlights which observations have 'N' greater than or equal to 100 or 'freg.1' greater or equal to 20. Currently, I am using the following code:

df$delete <- if (df$N >=100 | df$freq.1>=20) 1 else 0

... but it is putting 1s in every row - rows 8, 13, 19 and 20 should in fact have 0s.

Any ideas?

Upvotes: 1

Views: 66

Answers (1)

akrun
akrun

Reputation: 886938

One fast and hacky way to make the logical vector to binary is using +. This should be very fast (but not recommended by some experts).

df2[, delete:= +(N>=100|freq.1 >=20)]

It can be also done by wrapping with as.integer (not hacky, and considerably faster).

df2[, delete:= as.integer(N>=100|freq.1 >=20)]
df2
#     region   N freq.1 result level delete
# 1:   AT22 241     62  24.95     2      1
# 2:   AT13 346    104  29.97     2      1
# 3:   AT12 306     64  21.10     2      1
# 4:   AT11  55     20  36.27     2      1
# 5:   AT33 139     24  18.38     2      1
# 6:   AT31 311     78  24.80     2      1
# 7:   AT21 107     23  21.28     2      1
# 8:   AT34  79     10  12.54     2      0
# 9:   AT32 119     20  17.21     2      1
#10:   BE21 244     65  26.64     2      1
#11:   BE10 143     24  16.78     2      1
#12:   BE24 146     29  19.86     2      1
#13:   BE31  59      9  15.25     2      0
#14:   BE25 212     46  21.70     2      1
#15:   BE23 203     51  25.12     2      1
#16:   BE32 223     74  33.18     2      1
#17:   BE33 173     36  20.81     2      1
#18:   BE22 147     33  22.45     2      1
#19:   BE34  54     14  25.93     2      0
#20:   BE35  75     16  21.33     2      0

The OP's code didn't work out as if/else is not vectorized. It would have worked if we use ifelse i.e.

 df2[, delete:= ifelse(N>100|freq.1 >=20, 1, 0)]

ifelse is a convenient/canonical option and comparably fast.

NOTE: The OP's example dataset is a data.table. So, we are using the data.table methods (:=) for creating the column (it is assigning in place so will be very fast).

Upvotes: 2

Related Questions