Reputation: 219
I have a simple data frame with two columns and two rows. I am trying to iterate through each row to find what words are in column two that are not in column one. Sample data:
testdata <- data.frame(rbind(one = c("mango rasberry","mango rasberry blueberry"),
two = c("kiwi strawberry","kiwi strawberry passionfruit")))
So, the output should be a third column added to testdata that contains "Blueberry" in row 1 and "passionfruit" in row 2.
This is the function that I have so far:
extract <- function(input) {
extra<- apply(x, function(x) x[setdiff(unlist(str_split(input[,1]," ")), unlist(str_split(input[,2]," ")))])
extra
}
I'm getting the following error:
"argument "FUN" is missing, with no default "
Do you know what a good solution to this would be? Thank you for your help.
Upvotes: 0
Views: 61
Reputation: 13274
Try:
testdata <- data.frame(rbind(one = c("mango rasberry","mango rasberry blueberry"),
two = c("kiwi strawberry","kiwi strawberry passionfruit")), stringsAsFactors = F)
testdata$differences <- apply(testdata, 1, function(x) {
x1 <- unlist(strsplit(x[1], split = " "))
x2 <- unlist(strsplit(x[2], split = " "))
ifelse(length(x1) > length(x2), base::setdiff(x1,x2), base::setdiff(x2,x1))
})
The problem is that setdiff
starts with the first argument and works from there. If it looks in the first argument and sees that all its elements are matched in the second one, then it does not consider that there are differences. So, the vector with the most number of elements should be the first argument in this case.
You could have also done it by taking the difference of the union()
and the intersect()
as follows:
apply(testdata, 1, function(x) {
x1 <- unlist(strsplit(x[1], split = " "))
x2 <- unlist(strsplit(x[2], split = " "))
base::setdiff(base::union(x1,x2), base::intersect(x1,x2))
})
Desired output:
X1 X2 differences
mango rasberry mango rasberry blueberry blueberry
kiwi strawberry kiwi strawberry passionfruit passionfruit
I hope this helps.
Upvotes: 1