Reputation: 370
Here is code to generate some sample data:
set.seed(1000)
dat <- data.frame(A = runif(5, -5, 5),
B = runif(5, -5, 5),
C = runif(5, -5, 5))
Data:
A B C
1 -1.721213 -4.3226204 -1.500625
2 2.588465 2.3871486 2.554616
3 -3.860636 0.8353505 -1.829158
4 1.907551 -2.8422860 3.658128
5 0.164024 -2.4387760 2.641607
I want to compare each column and return a TRUE
or FALSE
depending on whether or not there there a negative or positive value as we go down by row. For example, in row 4 of columns A and B, we would get TRUE
. Because one is positive and the other is negative. If it were swapped (a negative value in A for that row but positive in B) then it would also return TRUE
.
However, if either of the two values being compared is between 1 to -1, the comparison won't happen and it will just return NA
. Here is what the final output should look like for this example:
A B C AB BC AC
1 -1.721213 -4.3226204 -1.500625 FALSE FALSE FALSE
2 2.588465 2.3871486 2.554616 FALSE FALSE FALSE
3 -3.860636 0.8353505 -1.829158 NA NA FALSE
4 1.907551 -2.8422860 3.658128 TRUE TRUE FALSE
5 0.164024 -2.4387760 2.641607 NA TRUE NA
I have tried to compare rows using this kind of logic:
if((dat$A > 1 & datB < -1) | (dat$A < -1 & dat$B > -1) == TRUE)
...
But I think there must be a more efficient way of doing it.
There are many great answers here that all have worked when I tested them out. I like the answer by mpalanco the best because of its readability and succinctness. However, I have chosen the answer by DMC since it can be generalized to other cases and may be useful for someone else searching for an answer to this or similar questions.
Upvotes: 4
Views: 2994
Reputation: 13570
A very basic step by step approach:
# To keep original data
dat2 <- dat
# Assign NA to values between -1 and -1
dat[ifelse(dat >-1 & dat < 1, TRUE, FALSE)] <- NA
# Create three new columns
dat[4:6] <- data.frame(AB = dat$A*dat$B, BC = dat$B*dat$C, AC = dat$A*dat$C)
# If the are positive will be FALSE
dat[4:6] <- ifelse(dat[, 4:6] > 0, FALSE, TRUE)
# Final result
cbind(dat2[, 1:3], dat[, 4:6])
Output:
A B C AB BC AC
1 -1.721213 -4.3226204 -1.500625 FALSE FALSE FALSE
2 2.588465 2.3871486 2.554616 FALSE FALSE FALSE
3 -3.860636 0.8353505 -1.829158 NA NA FALSE
4 1.907551 -2.8422860 3.658128 TRUE TRUE FALSE
5 0.164024 -2.4387760 2.641607 NA TRUE NA
Upvotes: 1
Reputation: 9123
Here is a result that generalizes in case you have more columns than just {A, B, C}
and you want to look at all pairs of columns:
library(tidyr)
library(dplyr)
# get original names
orig_names <- names(dat)
# add a row indicator
dat <- dat %>% mutate(k = row_number())
dat2 <- dat %>%
# reshape to long
gather(letter, value, A:C) %>%
# change value to {-1, 1}
mutate(
value = ifelse(value <= -1, -1, ifelse(value >= 1, 1, NA)),
letter = as.character(letter)
)
# create a placeholder data frame for result
d_new <- expand.grid(
V1 = orig_names,
V2 = orig_names,
k = 1:nrow(dat),
stringsAsFactors = FALSE
) %>%
filter(V1 < V2)
# compute result
result <- d_new %>%
left_join(dat2, by = c("V1" = "letter", "k" = "k")) %>%
left_join(dat2, by = c("V2" = "letter", "k" = "k")) %>%
mutate(
result = (value.x != value.y),
combo = paste0(V1, V2)
) %>%
select(-starts_with("value"), -V1, -V2) %>%
spread(combo, result)
# join with original data
dat %>% left_join(result)
## A B C k AB AC BC
## 1 -1.721213 -4.3226204 -1.500625 1 FALSE FALSE FALSE
## 2 2.588465 2.3871486 2.554616 2 FALSE FALSE FALSE
## 3 -3.860636 0.8353505 -1.829158 3 NA FALSE NA
## 4 1.907551 -2.8422860 3.658128 4 TRUE FALSE TRUE
## 5 0.164024 -2.4387760 2.641607 5 NA NA TRUE
Upvotes: 2
Reputation: 1932
This would be my solution.
dat[dat$A*dat$B<0 & abs(dat$A)>=1 & abs(dat$B)>=1,4]<-TRUE
dat[dat$A*dat$B>0 & abs(dat$A)>=1 & abs(dat$B)>=1,4]<-FALSE
dat[dat$C*dat$B<0 & abs(dat$C)>=1 & abs(dat$B)>=1,5]<-TRUE
dat[dat$C*dat$B>0 & abs(dat$C)>=1 & abs(dat$B)>=1,5]<-FALSE
dat[dat$A*dat$C<0 & abs(dat$A)>=1 & abs(dat$C)>=1,6]<-TRUE
dat[dat$A*dat$C>0 & abs(dat$A)>=1 & abs(dat$C)>=1,6]<-FALSE
colnames(dat)<-c("A","B","C","AB","BC","AC")
Upvotes: 1
Reputation: 20463
It might be best to use some functions:
is_between <- function(x, a, b) {
x > a & x < b
}
makeCol <- function(col1, col2) {
ifelse(
is_between(col1, -1, 1) | is_between(col2, -1, 1),
NA,
!as.logical(sign(col1) + sign(col2))
)
}
dat$AB <- makeCol(dat$A, dat$B)
dat$BC <- makeCol(dat$B, dat$C)
dat$AC <- makeCol(dat$A, dat$C)
To make this more DRY, you could follow @akrun's lead in his comment and do something like:
combn(seq_along(dat), 2, function(x) { makeCol(dat[, x[1]], dat[, x[2]]) })
Upvotes: 2