William Brunzell
William Brunzell

Reputation: 31

Check row wise if value is present in column and update new column row wise

In R, how can I check per row if any value in a list (eg. 2, 3 or 4) is present in any one of three columns and then change that row in a fourth column?

Say I have a df:

 A B C D
 1 1 1
 2 1 1
 3 1 1

I would like to write (without a for-loop) if row n (column A or B or C) == 2 or 3 or 4 then D[1,]=1, else = 0

Basically check row wise if my numbers are present in any of three specific columns and, if so, update a forth column with 1, if not a 0.

Thanks,

Upvotes: 2

Views: 3440

Answers (6)

markus
markus

Reputation: 26343

With only these three condition you could do

df1$D <- as.integer(rowSums(df1 == 2 | df1 == 3 | df1 == 4) >= 1) # or maybe df1 >=2 & df1 <= 4
df1
#  A B C D
#1 1 1 1 0
#2 2 1 1 1
#3 3 1 1 1

A more general approach could be

vec <- 2:4
df1$D <- Reduce(`+`, lapply(vec, \(x) rowSums(df1 == x) > 0))

data

df1 <- structure(list(A = 1:3, B = c(1L, 1L, 1L), C = c(1L, 1L, 1L)), .Names = c("A", 
"B", "C"), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

OTStats
OTStats

Reputation: 1868

Here is how you can use dplyr:

library(dplyr)
test <- data.frame(A = c(1, 2, 3), 
                   B = c(1, 1, 1), 
                   C = c(1, 1, 1))

testColumns <- c(2, 3, 4)                         # Values you want to flag 

Now that we have our data frame and a vector with the values we want to flag in a new column, let's use rowwise() to tell R to look at each row of the data frame, and then a combination of mutate() to create a new column, D, based off of various cases.
We specify the test cases and then their desired values using case_when().

Here's how we do it:

test <- test %>% 
  rowwise() %>%                                   # Look at test on a 'by row' basis'
  mutate(D = case_when(A %in% testColumns ~ 1,    # use mutate to create a new column D
                       B %in% testColumns ~ 1,
                       C %in% testColumns ~ 1, 
                       TRUE               ~ 0))

This gives us the following table:

print(test)
## A tibble: 3 x 4
#      A     B     C     D
#  <dbl> <dbl> <dbl> <dbl>
#1     1     1     1     0
#2     2     1     1     1
#3     3     1     1     1

Here are some helpful links for a few of the functions we used:
mutate()
rowwise()
case_when()

Upvotes: 2

moodymudskipper
moodymudskipper

Reputation: 47310

You could use apply :

vec <- 2:4
df1$D <- apply(df1,1, function(x) any(vec %in% x)) +0
#   A B C D
# 1 1 1 1 0
# 2 2 1 1 1
# 3 3 1 1 1

Or a tidyverse version, possibly more efficient as apply involve some matrix conversions :

library(tidyverse)
df1 %>% mutate(D = pmap_int(.,~any(vec %in% .)))
#   A B C D
# 1 1 1 1 0
# 2 2 1 1 1
# 3 3 1 1 1

data

df1 <- data.frame(A = c(1, 2,3), 
                   B = c(1, 1, 1), 
                   C = c(1, 1, 1))

Upvotes: 2

Walker in the City
Walker in the City

Reputation: 587

Here is a way to do it using data.table:

library(data.table)
test <- data.table(A = c(1, 2,3), 
                   B = c(1, 1, 1), 
                   C = c(1, 1, 1))
checkValues <- c(2, 3, 4)

test[, c("D"):= Reduce(`|`, lapply(.SD, function(x){x %in% checkValues}))]

test
   A B C     D
1: 1 1 1 FALSE
2: 2 1 1  TRUE
3: 3 1 1  TRUE

It would be easy to replace FALSE=0, TRUE=1 (replace Reduce(|, lapply(.SD, function(x){x %in% c(2, 3, 4)})) with as.numeric(Reduce(|, lapply(.SD, function(x){x %in% c(2, 3, 4)}))), but it seems that you are using D to hold a logical value so it would make sense to me to have it as a logical vector.

This also updates test to have column D by reference, which is more efficient.

Two answers to perhaps look at as well would be: Finding rows containing a value (or values) in any column and Add multiple columns to R data.table in one function call?

Upvotes: 0

Eric
Eric

Reputation: 3473

Parameterized for column names and numbers of interest.

library(tidyverse)

data <-
  data.frame(
    A = c(1, 2, 3), 
    B = c(1, 1, 1), 
    C = c(1, 1, 1)
  )

nums <- c(2, 3, 4)
cols <- c('A', 'B', 'C')

data$D <-
  data[, cols] %>%
  map(~.x %in% nums) %>%
  reduce(`|`)

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One way to do it in tidyverse:

df %>%
 rowid_to_column() %>% #Creating an unique row ID
 gather(var, val, -rowid) %>% #Transforming the data from wide to long
 group_by(rowid) %>% #Grouping
 mutate(D = ifelse(any(val %in% c(2, 3, 4)), 1, 0)) %>% #Testing whether any value from a given row is in the specified list 
 spread(var, val) %>% #Returning the data to wide format
 ungroup() %>%
 select(-rowid) #Deleting the redundant variable

      D     A     B     C
  <dbl> <int> <int> <int>
1    0.     1     1     1
2    1.     2     1     1
3    1.     3     1     1

Upvotes: 0

Related Questions