Reputation: 599
I have two columns . both are of character data type. One column has strings and other has got strings with quote. I want to compare both columns and find the no. of distinct names across the data frame.
string f.string.name
john NA
bravo NA
NA "john"
NA "hulk"
Here the count should be 2, as john is common.
Somehow i am not able to remove quotes from second column. Not sure why.
Thanks
Upvotes: 0
Views: 90
Reputation: 13581
library(stringr)
table(str_replace_all(unlist(df), '["]', ''))
# bravo hulk john
# 1 1 2
Upvotes: 1
Reputation: 76402
The main problem I'm seeing are the NA
values.
First, let's get rid of the quotes you mention.
dat$f.string.name <- gsub('["]', '', dat$f.string.name)
Now, count the number of distinct values.
i1 <- complete.cases(dat$string)
i2 <- complete.cases(dat$f.string.name)
sum(dat$string[i1] %in% dat$f.string.name[i2]) + sum(dat$f.string.name[i2] %in% dat$string[i1])
DATA
dat <-
structure(list(string = c("john", "bravo", NA, NA), f.string.name = c(NA,
NA, "\"john\"", "\"hulk\"")), .Names = c("string", "f.string.name"
), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 2