mbansal
mbansal

Reputation: 1

modify a new col value based on other col values in data frame in R

I am struggling to achieve the following:

df=
  c1 c2 c3 c4
1  1  4  2  5
2  1  2  3  4
3  3  4  5  5
4  1  2  5  5
5  2  3  5  5

for each observation(i.e. row) if any of the col i.e. col1, col2, col3, col4 has both 1&4 values or only 1 or only 4; then set c5 (a new column) to 14, 1, 4 respectively for the cases else set it to -1. So c5 value I want to achieve is

c5
14
14
4
1
-1

Wanted to achieve it using dplyr package and mutate functionality like below but its giving error.

df = df %>% mutate (c5= ifelse(c(df$c1,df$c2,df$c3,df$c4 %in% c(1,4), 14, 
                        ifelse(c(df$c1,df$c2,df$c3,df$c4 %in% c(1),1, 
                        ifelse(c(df$c1,df$c2,df$c3,df$c4 %in% c(4),-1))))

What is the right and efficient way to achieve this please ?

Upvotes: 0

Views: 85

Answers (3)

www
www

Reputation: 39154

A solution using dplyr. rowSums(. == 1) and rowSums(. == 4) are to how many elements are 1 or 4, respectively. case_when is to assign the replacement value. The idea is similar to your original multiple ifelse statement, but case_when can provide a more clear concise way to specify conditions.

library(dplyr)

df2 <- df %>%
  mutate(c5 = case_when(
    rowSums(. == 1) > 0 & rowSums(. == 4) > 0     ~ 14,
    rowSums(. == 1) > 0 & rowSums(. == 4) == 0    ~  1,
    rowSums(. == 1) == 0 & rowSums(. == 4) > 0    ~  4,
    TRUE                                          ~ -1
  ))
df2
#   c1 c2 c3 c4 c5
# 1  1  4  2  5 14
# 2  1  2  3  4 14
# 3  3  4  5  5  4
# 4  1  2  5  5  1
# 5  2  3  5  5 -1

DATA

df <- read.table(text = "  c1 c2 c3 c4
1  1  4  2  5
2  1  2  3  4
3  3  4  5  5
4  1  2  5  5
5  2  3  5  5",
                 header = TRUE)

Upvotes: 0

Onyambu
Onyambu

Reputation: 79288

This is using base R. You can use the idea with the dplyr package

   dat$c5=apply(dat,1,function(x){y=x%in%c(1,4);
                     ifelse(any(y),as.numeric(paste0(sort(unique(x[y])),collapse = "")),-1)})
      dat
   c1 c2 c3 c4 c5
 1  1  4  2  5 14
 2  1  2  3  4 14
 3  3  4  5  5  4
 4  1  2  5  5  1
 5  2  3  5  5 -1

Upvotes: 1

thelatemail
thelatemail

Reputation: 93908

Make two vectorised comparisons, then look at their interaction:

one <- rowSums(df == 1) > 0
four <- rowSums(df == 4) > 0
c(-1,1,4,14)[interaction(one,four)]
#[1] 14 14  4  1 -1

Or all in one line if you want:

c(-1,1,4,14)[interaction( rowSums(df==1) > 0, rowSums(df==4) > 0 )]
#[1] 14 14  4  1 -1

Upvotes: 2

Related Questions