Reputation: 471
I have two columns of strings in a dataframe, and for each row I want to see the characters which differ.
E.g given
Lines <- "
a b
cat car
dog ding
cow haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
return
a b diff
cat car t
dog ding o
cow haw co
I have seen
as well as
where a number of neat solutions are returned, which would work for an individual row (first reference), or act row wise but not exactly what I want (second reference).
Ideally I'd like to use something like this:
Reduce(setdiff, strsplit(c(a, b), split = ""))
I tried:
apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))
but to no avail.
How can this be done?
p.s. I'm particularly keen to do this using dplyr if possible, but only for stylistic reasons
Upvotes: 1
Views: 1211
Reputation: 39154
A solution from tidyverse
and stringr
.
library(tidyverse)
library(stringr)
dt2 <- dt %>%
mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
mutate(diff = map2(a_list, b_list, setdiff)) %>%
mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
a b diff
<chr> <chr> <chr>
1 cat car t
2 dog ding o
3 cow haw co
DATA
dt <- read.table(text = "a b
cat car
dog ding
cow haw",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2
Reputation: 536
Using dplyr
library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))
> st
a b diff
1 dog dot g
2 chair liar ch
3 love over l
Upvotes: 1
Reputation: 38510
Here is another base R method using Map
.
diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"
[[2]]
[1] "o"
[[3]]
[1] "c" "o"
You can wrap this in sapply
to return a character vector for your data.frame:
dat$charDiffs <-sapply(diffList, paste, collapse="")
which returns
dat
a b charDiffs
1 cat car t
2 dog ding o
3 cow haw co
data (from dput
)
dat <-
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding",
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
Upvotes: 0
Reputation: 269654
Assuming df
shown reproducibly in the Note at the end define a function Diff
which accepts two vecdors of strings, runs the setdiff on them and pastes the result together and then use mapply
to run that on the two columns after splitting them into individual characters.
Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))
giving:
a b diff
1 cat car t
2 dog ding o
3 cow haw co
Note: The input df
used above is:
Lines <- "
a b
cat car
dog ding
cow haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
Upvotes: 2