Reputation: 8474
For a sample dataframe:
df2<- structure(list(region = c("AT22", "AT13", "AT12", "AT11", "AT33",
"AT31", "AT21", "AT34", "AT32", "BE21", "BE10", "BE24", "BE31",
"BE25", "BE23", "BE32", "BE33", "BE22", "BE34", "BE35"), N = c(241L,
346L, 306L, 55L, 139L, 311L, 107L, 79L, 119L, 244L, 143L, 146L,
59L, 212L, 203L, 223L, 173L, 147L, 54L, 75L), freq.1 = c(62L,
104L, 64L, 20L, 24L, 78L, 23L, 10L, 20L, 65L, 24L, 29L, 9L, 46L,
51L, 74L, 36L, 33L, 14L, 16L), result = c(24.95, 29.97, 21.1,
36.27, 18.38, 24.8, 21.28, 12.54, 17.21, 26.64, 16.78, 19.86,
15.25, 21.7, 25.12, 33.18, 20.81, 22.45, 25.93, 21.33), level = c(2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), delete = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("region",
"N", "freq.1", "result", "level", "delete"), class = c("data.table",
"data.frame"), row.names = c(NA, -20L))
I want to create a variable called 'delete' which highlights which observations have 'N' greater than or equal to 100 or 'freg.1' greater or equal to 20. Currently, I am using the following code:
df$delete <- if (df$N >=100 | df$freq.1>=20) 1 else 0
... but it is putting 1s in every row - rows 8, 13, 19 and 20 should in fact have 0s.
Any ideas?
Upvotes: 1
Views: 66
Reputation: 886938
One fast and hacky way to make the logical vector to binary is using +
. This should be very fast (but not recommended by some experts).
df2[, delete:= +(N>=100|freq.1 >=20)]
It can be also done by wrapping with as.integer
(not hacky, and considerably faster).
df2[, delete:= as.integer(N>=100|freq.1 >=20)]
df2
# region N freq.1 result level delete
# 1: AT22 241 62 24.95 2 1
# 2: AT13 346 104 29.97 2 1
# 3: AT12 306 64 21.10 2 1
# 4: AT11 55 20 36.27 2 1
# 5: AT33 139 24 18.38 2 1
# 6: AT31 311 78 24.80 2 1
# 7: AT21 107 23 21.28 2 1
# 8: AT34 79 10 12.54 2 0
# 9: AT32 119 20 17.21 2 1
#10: BE21 244 65 26.64 2 1
#11: BE10 143 24 16.78 2 1
#12: BE24 146 29 19.86 2 1
#13: BE31 59 9 15.25 2 0
#14: BE25 212 46 21.70 2 1
#15: BE23 203 51 25.12 2 1
#16: BE32 223 74 33.18 2 1
#17: BE33 173 36 20.81 2 1
#18: BE22 147 33 22.45 2 1
#19: BE34 54 14 25.93 2 0
#20: BE35 75 16 21.33 2 0
The OP's code didn't work out as if/else
is not vectorized. It would have worked if we use ifelse
i.e.
df2[, delete:= ifelse(N>100|freq.1 >=20, 1, 0)]
ifelse
is a convenient/canonical option and comparably fast.
NOTE: The OP's example dataset is a data.table
. So, we are using the data.table methods (:=
) for creating the column (it is assigning in place so will be very fast).
Upvotes: 2