Earthshaker
Earthshaker

Reputation: 599

to find count of distinct values across two columns in r

I have two columns . both are of character data type. One column has strings and other has got strings with quote. I want to compare both columns and find the no. of distinct names across the data frame.

string f.string.name
john      NA
bravo     NA
NA        "john"
NA        "hulk"

Here the count should be 2, as john is common.

Somehow i am not able to remove quotes from second column. Not sure why.

Thanks

Upvotes: 0

Views: 90

Answers (2)

CPak
CPak

Reputation: 13581

library(stringr)
table(str_replace_all(unlist(df), '["]', ''))

# bravo  hulk  john 
# 1     1     2

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

The main problem I'm seeing are the NA values.
First, let's get rid of the quotes you mention.

dat$f.string.name <- gsub('["]', '', dat$f.string.name)

Now, count the number of distinct values.

i1 <- complete.cases(dat$string)
i2 <- complete.cases(dat$f.string.name)
sum(dat$string[i1] %in% dat$f.string.name[i2]) + sum(dat$f.string.name[i2] %in% dat$string[i1])

DATA

dat <-
structure(list(string = c("john", "bravo", NA, NA), f.string.name = c(NA, 
NA, "\"john\"", "\"hulk\"")), .Names = c("string", "f.string.name"
), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 2

Related Questions