Alokin
Alokin

Reputation: 505

Removing a repeated value in a row

I have two columns in a data frame that may or may not have copied values in them. If the second column has the same value as the first column, I would like to replace that value with a NULL value or a string indicating the value has been replaced. If the values are different, I want to keep both of those values. For example: I want to take this

col_1      col_2
a            a
a            b
b            d
c            c
c            d
c            c
a            a

And turn this into:

col_1      col_2
a            NULL
a            b
b            d
c            NULL
c            d
c            NULL
a            NULL

How can I do that?

Upvotes: 1

Views: 32

Answers (3)

akrun
akrun

Reputation: 887128

We can use data.table methods which is fast and efficient

library(data.table)
setDT(df)[col_1 == col_2, col_2 := 'NULL']

-output

df
#   col_1 col_2
#1:     a  NULL
#2:     a     b
#3:     b     d
#4:     c  NULL
#5:     c     d

data

df <- structure(list(col_1 = c("a", "a", "b", "c", "c"), col_2 = c("a", 
"b", "d", "c", "d")), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 0

Karthik S
Karthik S

Reputation: 11584

By NULL value, I assume you need NA, if you need actual string NULL, you can use 'NULL' in place of NA_character_ as in Duck's answer.

library(dplyr)
df %>% 
mutate(col_2 = case_when(col_1 == col_2 ~ NA_character_, TRUE ~ col_2))
# A tibble: 5 x 2
# Rowwise: 
  col_1 col_2
  <chr> <chr>
1 a     NA   
2 a     b    
3 b     d    
4 c     NA   
5 c     d    

Based on new input:

df %>% mutate(col_2 = case_when(col_1 == col_2 ~ NA_character_, TRUE ~ col_2))
# A tibble: 7 x 2
# Rowwise: 
  col_1 col_2
  <chr> <chr>
1 a     NA   
2 a     b    
3 b     d    
4 c     NA   
5 c     d    
6 c     NA   
7 a     NA   

Data used:

df
# A tibble: 7 x 2
  col_1 col_2
  <chr> <chr>
1 a     a    
2 a     b    
3 b     d    
4 c     c    
5 c     d    
6 c     c    
7 a     a    

Upvotes: 1

Duck
Duck

Reputation: 39595

You can also try:

#Code
df$col_2 <- ifelse(df$col_2==df$col_1,'NULL',df$col_2)

Output:

df
  col_1 col_2
1     a  NULL
2     a     b
3     b     d
4     c  NULL
5     c     d

Some data used:

#Data
df <- structure(list(col_1 = c("a", "a", "b", "c", "c"), col_2 = c("a", 
"b", "d", "c", "d")), class = "data.frame", row.names = c(NA, 
-5L))

Another option can be, using correct R sintax:

#Code2
df$col_2[df$col_2==df$col_1]<-'NULL'

Same output.

Using the ifelse() approach, we get this:

df
  col_1 col_2
1     a  NULL
2     a     b
3     b     d
4     c  NULL
5     c     d
6     c  NULL
7     a  NULL

Upvotes: 2

Related Questions