TMrtSmith
TMrtSmith

Reputation: 471

Row-wise extract characters that differ between two strings

I have two columns of strings in a dataframe, and for each row I want to see the characters which differ.

E.g given

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

return

a     b     diff
cat   car   t
dog   ding  o
cow   haw   co

I have seen

Extract characters that differ between two strings

as well as

Split comma-separated column into separate rows

where a number of neat solutions are returned, which would work for an individual row (first reference), or act row wise but not exactly what I want (second reference).

Ideally I'd like to use something like this:

Reduce(setdiff, strsplit(c(a, b), split = ""))

I tried:

apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))

but to no avail.

How can this be done?

p.s. I'm particularly keen to do this using dplyr if possible, but only for stylistic reasons

Upvotes: 1

Views: 1211

Answers (4)

www
www

Reputation: 39154

A solution from tidyverse and stringr.

library(tidyverse)
library(stringr)

dt2 <- dt %>%
  mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
  mutate(diff = map2(a_list, b_list, setdiff)) %>%
  mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
  select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
      a     b  diff
  <chr> <chr> <chr>
1   cat   car     t
2   dog  ding     o
3   cow   haw    co

DATA

dt <- read.table(text = "a     b
cat   car
                 dog   ding
                 cow   haw",
                 header = TRUE, stringsAsFactors = FALSE)

Upvotes: 2

Sai Prabhanjan Reddy
Sai Prabhanjan Reddy

Reputation: 536

Using dplyr

library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))

> st
      a    b diff
1   dog  dot    g
2 chair liar   ch
3  love over    l

Upvotes: 1

lmo
lmo

Reputation: 38510

Here is another base R method using Map.

diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"

[[2]]
[1] "o"

[[3]]
[1] "c" "o"

You can wrap this in sapply to return a character vector for your data.frame:

dat$charDiffs <-sapply(diffList, paste, collapse="")

which returns

dat
    a    b charDiffs
1 cat  car         t
2 dog ding         o
3 cow  haw        co

data (from dput)

dat <- 
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding", 
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269654

Assuming df shown reproducibly in the Note at the end define a function Diff which accepts two vecdors of strings, runs the setdiff on them and pastes the result together and then use mapply to run that on the two columns after splitting them into individual characters.

Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))

giving:

    a    b diff
1 cat  car    t
2 dog ding    o
3 cow  haw   co

Note: The input df used above is:

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

Upvotes: 2

Related Questions