Reputation: 1667
I have a dataframe with integers and I would like to convert them into a binary system (or tertiary if you will) where they become 1 if greater than x, -1 if smaller than y, and 0 else.
This is an example dataframe:
var1 var2 var3
30 13 2
20 29 3
This is what my new dataframe should look like (x is 27 and y is 4):
var1 var2 var3
1 0 - 1
0 1 -1
Is there a simple way of doing this?
Upvotes: 0
Views: 32
Reputation: 38520
Here is a pretty quick base R answer. This will be super fast as long as the data set is fairly small relative to the amount of available RAM.
dat[] <- findInterval(as.matrix(dat), vec = c(4, 27),
rightmost.closed=TRUE) - 1L
Here, since each column has the same breaks, then you convert a copy of the data.frame to a matrix and run findInterval
on using those breaks. The rightmost.closed=TRUE makes sure that these values are included in their set. Then, since findInterval
returns values beginning with 0, subtract 1 to get the desired values.
Using dat[] <-
puts the resulting vector into the data.frame.
This returns
dat
var1 var2 var3
1 1 0 -1
2 0 1 -1
data
dat <-
structure(list(var1 = c(30L, 20L), var2 = c(13L, 29L), var3 = 2:3),
.Names = c("var1", "var2", "var3"), class = "data.frame",
row.names = c(NA, -2L))
Upvotes: 2
Reputation: 5225
Here's a relatively succinct way to manage this with mutate_all
and case_when
from dplyr
:
x <- 27
y <- 4
df %>% mutate_all(funs(case_when(. > x ~ 1, . < y ~ -1, TRUE ~ 0)))
# var1 var2 var3
# 1 1 0 -1
# 2 0 1 -1
This can also be done with nested application of ifelse
, though it's less extensible (i.e. pretty quickly becomes unwieldy if your list of conditions grows):
ifelse(df > x, 1, ifelse(df < y, -1, 0))
Though since you mention that you're doing "tertiary" encoding, perhaps that's all you need.
Upvotes: 1