Reputation: 383
Having a df like this:
df_in <- data.frame(x = c('x1','x2','x3','x4'),
col1 = c('http://youtube.com/something','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
col2 = c('www.youtube.com/searcht', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'))
What kind of grep should I implement in order to check col1 and col2 and if "youtube" phrase exist in both columns in the same row replace it with NA.
Example of expected output
> df_in <- data.frame(x = c('x1','x2','x3','x4'),
+ col1 = c('NA','NA','https://www.yahooexample.com','https://www.yahooexample2.com'),
+ col2 = c('NA', 'http://www.bbcnews2.com?id=321','NA','https://google.com/text'))
> df_in
x col1 col2
1 x1 NA NA
2 x2 NA http://www.bbcnews2.com?id=321
3 x3 https://www.yahooexample.com NA
4 x4 https://www.yahooexample2.com https://google.com/text
Upvotes: 1
Views: 80
Reputation: 12074
Not an improvement on @Jaap's answer, but an alternative:
df_in[do.call(`&`,lapply(df_in[,2:3], grepl, pattern = "youtube")), 2:3] <- 'NA'
Upvotes: 1
Reputation: 83225
Another option is to use rowSums
with sapply
and grepl
:
df_in[rowSums(sapply(df_in, grepl, pattern = 'youtube')) > 1, 2:3] <- 'NA'
which gives:
> df_in x col1 col2 1 x1 NA NA 2 x2 NA http://www.bbcnews2.com?id=321 3 x3 https://www.yahooexample.com NA 4 x4 https://www.yahooexample2.com https://google.com/text
Upvotes: 6
Reputation: 17648
you can try a tidyverse solution:
library(tidyverse)
df_in %>%
unite(a,starts_with("col")) %>%
mutate(a=ifelse(str_count(a,"youtube")>1, "NA_NA", a)) %>%
separate(a, c("col1","col2"), "_")
x col1 col2
1 x1 NA NA
2 x2 NA http://www.bbcnews2.com?id=321
3 x3 https://www.yahooexample.com NA
4 x4 https://www.yahooexample2.com https://google.com/text
In base R I would do:
df_out <- df_in
df_out[,-1] <- t(apply(df_in[,-1], 1, function(x){
tmp <- grepl("youtube",x[1]) + grepl("youtube",x[2])
if(tmp > 1) rep("NA",2) else x
}
))
df_out
x col1 col2
1 x1 NA NA
2 x2 NA http://www.bbcnews2.com?id=321
3 x3 https://www.yahooexample.com NA
4 x4 https://www.yahooexample2.com https://google.com/text
The idea is to count the occurrence of "youtube"
. If there are n=2
per row, replace the entries with NA
otherwise leave them as they are.
Upvotes: 3