Reputation: 31
In R, how can I check per row if any value in a list (eg. 2, 3 or 4) is present in any one of three columns and then change that row in a fourth column?
Say I have a df:
A B C D
1 1 1
2 1 1
3 1 1
I would like to write (without a for-loop) if row n (column A or B or C) == 2 or 3 or 4 then D[1,]=1, else = 0
Basically check row wise if my numbers are present in any of three specific columns and, if so, update a forth column with 1, if not a 0.
Thanks,
Upvotes: 2
Views: 3440
Reputation: 26343
With only these three condition you could do
df1$D <- as.integer(rowSums(df1 == 2 | df1 == 3 | df1 == 4) >= 1) # or maybe df1 >=2 & df1 <= 4
df1
# A B C D
#1 1 1 1 0
#2 2 1 1 1
#3 3 1 1 1
A more general approach could be
vec <- 2:4
df1$D <- Reduce(`+`, lapply(vec, \(x) rowSums(df1 == x) > 0))
data
df1 <- structure(list(A = 1:3, B = c(1L, 1L, 1L), C = c(1L, 1L, 1L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Reputation: 1868
Here is how you can use dplyr
:
library(dplyr)
test <- data.frame(A = c(1, 2, 3),
B = c(1, 1, 1),
C = c(1, 1, 1))
testColumns <- c(2, 3, 4) # Values you want to flag
Now that we have our data frame and a vector with the values we want to flag in a new column, let's use rowwise()
to tell R to look at each row of the data frame, and then a combination of mutate()
to create a new column, D, based off of various cases.
We specify the test cases and then their desired values using case_when()
.
Here's how we do it:
test <- test %>%
rowwise() %>% # Look at test on a 'by row' basis'
mutate(D = case_when(A %in% testColumns ~ 1, # use mutate to create a new column D
B %in% testColumns ~ 1,
C %in% testColumns ~ 1,
TRUE ~ 0))
This gives us the following table:
print(test)
## A tibble: 3 x 4
# A B C D
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 0
#2 2 1 1 1
#3 3 1 1 1
Here are some helpful links for a few of the functions we used:
mutate()
rowwise()
case_when()
Upvotes: 2
Reputation: 47310
You could use apply
:
vec <- 2:4
df1$D <- apply(df1,1, function(x) any(vec %in% x)) +0
# A B C D
# 1 1 1 1 0
# 2 2 1 1 1
# 3 3 1 1 1
Or a tidyverse
version, possibly more efficient as apply
involve some matrix conversions :
library(tidyverse)
df1 %>% mutate(D = pmap_int(.,~any(vec %in% .)))
# A B C D
# 1 1 1 1 0
# 2 2 1 1 1
# 3 3 1 1 1
data
df1 <- data.frame(A = c(1, 2,3),
B = c(1, 1, 1),
C = c(1, 1, 1))
Upvotes: 2
Reputation: 587
Here is a way to do it using data.table
:
library(data.table)
test <- data.table(A = c(1, 2,3),
B = c(1, 1, 1),
C = c(1, 1, 1))
checkValues <- c(2, 3, 4)
test[, c("D"):= Reduce(`|`, lapply(.SD, function(x){x %in% checkValues}))]
test
A B C D
1: 1 1 1 FALSE
2: 2 1 1 TRUE
3: 3 1 1 TRUE
It would be easy to replace FALSE=0
, TRUE=1
(replace Reduce(
|, lapply(.SD, function(x){x %in% c(2, 3, 4)}))
with as.numeric(Reduce(
|, lapply(.SD, function(x){x %in% c(2, 3, 4)})))
, but it seems that you are using D
to hold a logical value so it would make sense to me to have it as a logical vector.
This also updates test
to have column D
by reference, which is more efficient.
Two answers to perhaps look at as well would be: Finding rows containing a value (or values) in any column and Add multiple columns to R data.table in one function call?
Upvotes: 0
Reputation: 3473
Parameterized for column names and numbers of interest.
library(tidyverse)
data <-
data.frame(
A = c(1, 2, 3),
B = c(1, 1, 1),
C = c(1, 1, 1)
)
nums <- c(2, 3, 4)
cols <- c('A', 'B', 'C')
data$D <-
data[, cols] %>%
map(~.x %in% nums) %>%
reduce(`|`)
Upvotes: 0
Reputation: 39858
One way to do it in tidyverse
:
df %>%
rowid_to_column() %>% #Creating an unique row ID
gather(var, val, -rowid) %>% #Transforming the data from wide to long
group_by(rowid) %>% #Grouping
mutate(D = ifelse(any(val %in% c(2, 3, 4)), 1, 0)) %>% #Testing whether any value from a given row is in the specified list
spread(var, val) %>% #Returning the data to wide format
ungroup() %>%
select(-rowid) #Deleting the redundant variable
D A B C
<dbl> <int> <int> <int>
1 0. 1 1 1
2 1. 2 1 1
3 1. 3 1 1
Upvotes: 0