R - extract elements from character string that NOT match other string

Question

I want to extract elements from one character string that are not in an other character string.

What is the fastest (vectorized?) approach?

Mock data:

library(data.table)
dt <- data.table(id = c("A", "B", "C", "D"),
             product= c("1", "1,2", "1,2,3", "4"),
             stock= c("2, 3", "1,2", "1,2", "4"))

> dt
   id product stock
1:  A       1  2, 3
2:  B     1,2   1,2
3:  C   1,2,3   1,2
4:  D       4     4

What I am looking for is a new variable called new that holds the elements from product that are not in stock.

> dt
   id product stock  new
1:  A       1  2, 3    1
2:  B     1,2   1,2 
3:  C   1,2,3   1,2    3
4:  D       4     4

Note: it seems to be the exact opposite of stringr::str_extract_all, but this function doesn't have a negate function.

akrun · Accepted Answer

Here is one option by splitting the columns of interest with strssplit, use setdiff to find the elemens not in the second one. If there are not values i.e. if the length iss 0, then return NA

f1 <- function(x, y) {
    x1 <- setdiff(x, y)
   if(!length(x1)) NA_character_ else x1
 }

dt[, new := do.call(Map, c(f = f1,
    unname(lapply(.SD, strsplit, ",")))), .SDcols = 2:3]
dt
#   id product stock  new
#1:  A       1  2, 3    1
#2:  B     1,2   1,2 
#3:  C   1,2,3   1,2    3
#4:  D       4     4

Or if we need to use str_extract_all, the tidyverse option would be

library(tidyverse)
dt %>% 
   mutate_at(2:3, list(newvar = ~ str_extract_all(., '\d+'))) %>%  
   transmute(id, product, stock, new = map2(product_newvar, stock_newvar, f1))

R - extract elements from character string that NOT match other string

Answers (2)

Related Questions