user8831872
user8831872

Reputation: 383

Introduce NA in a value text exist into two columns

Having a df like this:

df_in <- data.frame(x = c('x1','x2','x3','x4'),
                    col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
                    col2 = c('www.youtube.com/searcht', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'))

What kind of grep should I implement in order to check col1 and col2 and if "youtube" phrase exist in both columns in the same row replace it with NA.

Example of expected output

> df_in <- data.frame(x = c('x1','x2','x3','x4'),
+                     col1 = c('NA','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
+                     col2 = c('NA', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'))
> df_in
   x                          col1                           col2
1 x1                            NA                             NA
2 x2                            NA http://www.bbcnews2.com?id=321
3 x3  https://www.yahooexample.com                             NA
4 x4 https://www.yahooexample2.com        https://google.com/text

Upvotes: 1

Views: 80

Answers (3)

Dan
Dan

Reputation: 12074

Not an improvement on @Jaap's answer, but an alternative:

df_in[do.call(`&`,lapply(df_in[,2:3], grepl, pattern = "youtube")), 2:3] <- 'NA'

Upvotes: 1

Jaap
Jaap

Reputation: 83225

Another option is to use rowSums with sapply and grepl:

df_in[rowSums(sapply(df_in, grepl, pattern = 'youtube')) > 1, 2:3] <- 'NA'

which gives:

> df_in
   x                          col1                           col2
1 x1                            NA                             NA
2 x2                            NA http://www.bbcnews2.com?id=321
3 x3  https://www.yahooexample.com                             NA
4 x4 https://www.yahooexample2.com        https://google.com/text

Upvotes: 6

Roman
Roman

Reputation: 17648

you can try a tidyverse solution:

library(tidyverse)
df_in %>% 
  unite(a,starts_with("col")) %>% 
  mutate(a=ifelse(str_count(a,"youtube")>1, "NA_NA", a)) %>% 
  separate(a, c("col1","col2"), "_")
   x                          col1                           col2
1 x1                            NA                             NA
2 x2                            NA http://www.bbcnews2.com?id=321
3 x3  https://www.yahooexample.com                             NA
4 x4 https://www.yahooexample2.com        https://google.com/text

In base R I would do:

df_out <- df_in
df_out[,-1] <-  t(apply(df_in[,-1], 1, function(x){
      tmp <- grepl("youtube",x[1]) + grepl("youtube",x[2])
      if(tmp > 1) rep("NA",2) else x

  }
    ))
df_out  
   x                          col1                           col2
1 x1                            NA                             NA
2 x2                            NA http://www.bbcnews2.com?id=321
3 x3  https://www.yahooexample.com                             NA
4 x4 https://www.yahooexample2.com        https://google.com/text

The idea is to count the occurrence of "youtube". If there are n=2 per row, replace the entries with NA otherwise leave them as they are.

Upvotes: 3

Related Questions