Reputation: 1204
I want to extract elements from one character string that are not in an other character string.
What is the fastest (vectorized?) approach?
Mock data:
library(data.table)
dt <- data.table(id = c("A", "B", "C", "D"),
product= c("1", "1,2", "1,2,3", "4"),
stock= c("2, 3", "1,2", "1,2", "4"))
> dt
id product stock
1: A 1 2, 3
2: B 1,2 1,2
3: C 1,2,3 1,2
4: D 4 4
What I am looking for is a new variable called new
that holds the elements from product
that are not in stock
.
> dt
id product stock new
1: A 1 2, 3 1
2: B 1,2 1,2 <NA>
3: C 1,2,3 1,2 3
4: D 4 4 <NA>
Note: it seems to be the exact opposite of stringr::str_extract_all
, but this function doesn't have a negate
function.
Upvotes: 1
Views: 1098
Reputation: 47320
using only base and data.table, we loop in parallel through both columns and use setdiff
, then add the NAs and make it an atomic vector :
dt[,new:= mapply(setdiff, strsplit(product, ","), strsplit(stock, ","))]
is.na(dt$new) <- !lengths(dt$new)
dt$new <- unlist(dt$new)
dt
#> id product stock new
#> 1: A 1 2, 3 1
#> 2: B 1,2 1,2 NA
#> 3: C 1,2,3 1,2 3
#> 4: D 4 4 NA
Here it is in pure data.table code :
dt[,new:= mapply(setdiff, strsplit(product, ","), strsplit(stock, ","))][
lengths(new) == 0, new := NA][
, new := unlist(new)]
Upvotes: 1
Reputation: 887183
Here is one option by splitting the columns of interest with strssplit
, use setdiff
to find the elemens not in the second one. If there are not values i.e. if the length
iss 0, then return NA
f1 <- function(x, y) {
x1 <- setdiff(x, y)
if(!length(x1)) NA_character_ else x1
}
dt[, new := do.call(Map, c(f = f1,
unname(lapply(.SD, strsplit, ",")))), .SDcols = 2:3]
dt
# id product stock new
#1: A 1 2, 3 1
#2: B 1,2 1,2 <NA>
#3: C 1,2,3 1,2 3
#4: D 4 4 <NA>
Or if we need to use str_extract_all
, the tidyverse option would be
library(tidyverse)
dt %>%
mutate_at(2:3, list(newvar = ~ str_extract_all(., '\\d+'))) %>%
transmute(id, product, stock, new = map2(product_newvar, stock_newvar, f1))
Upvotes: 6