wake_wake
wake_wake

Reputation: 1204

R - extract elements from character string that NOT match other string

I want to extract elements from one character string that are not in an other character string.

What is the fastest (vectorized?) approach?

Mock data:

library(data.table)
dt <- data.table(id = c("A", "B", "C", "D"),
             product= c("1", "1,2", "1,2,3", "4"),
             stock= c("2, 3", "1,2", "1,2", "4"))

> dt
   id product stock
1:  A       1  2, 3
2:  B     1,2   1,2
3:  C   1,2,3   1,2
4:  D       4     4

What I am looking for is a new variable called new that holds the elements from product that are not in stock.

> dt
   id product stock  new
1:  A       1  2, 3    1
2:  B     1,2   1,2 <NA>
3:  C   1,2,3   1,2    3
4:  D       4     4 <NA>

Note: it seems to be the exact opposite of stringr::str_extract_all, but this function doesn't have a negate function.

Upvotes: 1

Views: 1098

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47320

using only base and data.table, we loop in parallel through both columns and use setdiff, then add the NAs and make it an atomic vector :

dt[,new:= mapply(setdiff, strsplit(product, ","), strsplit(stock, ","))]
is.na(dt$new) <- !lengths(dt$new)
dt$new <- unlist(dt$new)
dt
#>    id product stock new
#> 1:  A       1  2, 3   1
#> 2:  B     1,2   1,2  NA
#> 3:  C   1,2,3   1,2   3
#> 4:  D       4     4  NA

Here it is in pure data.table code :

dt[,new:= mapply(setdiff, strsplit(product, ","), strsplit(stock, ","))][
  lengths(new) == 0, new := NA][
    , new := unlist(new)]

Upvotes: 1

akrun
akrun

Reputation: 887183

Here is one option by splitting the columns of interest with strssplit, use setdiff to find the elemens not in the second one. If there are not values i.e. if the length iss 0, then return NA

f1 <- function(x, y) {
    x1 <- setdiff(x, y)
   if(!length(x1)) NA_character_ else x1
 }

dt[, new := do.call(Map, c(f = f1,
    unname(lapply(.SD, strsplit, ",")))), .SDcols = 2:3]
dt
#   id product stock  new
#1:  A       1  2, 3    1
#2:  B     1,2   1,2 <NA>
#3:  C   1,2,3   1,2    3
#4:  D       4     4 <NA>

Or if we need to use str_extract_all, the tidyverse option would be

library(tidyverse)
dt %>% 
   mutate_at(2:3, list(newvar = ~ str_extract_all(., '\\d+'))) %>%  
   transmute(id, product, stock, new = map2(product_newvar, stock_newvar, f1))

Upvotes: 6

Related Questions