Reputation: 623
I have an array of strings of characters made up of all the possible combinations of the 4 letters J, K, Q, Z
. The entries in the array are made up of at least two letters and at most 4. For example: data<-c("QK", "KQ", "JKQZ", "KJZ")
.
I would like to count the number of times each entry in the array occurs but without differentiating between strings that are made up of the same letters but in different order. I know table(data)
doesn't do this since it doesn't think of QK
and KQ
as the same and returns
data
JKQZ KJZ KQ QK
1 1 1 1
I have been looking at pmatch
or charmatch
but that doesn't seem to do what I want.
EDIT: I should clarify that there are no entries in which a letter is repeated. In essence, I cannot have an entry ZZ
or KZK
Upvotes: 4
Views: 161
Reputation: 9687
I would first make a table per observation (set as a factor to get the zero cells), then hash each table and count that:
require(magrittr)
require(digest)
data<-c("QK", "KQ", "JKQZ", "KJZ")
tbl <- strsplit(data, "") %>% lapply(factor,levels=c("K","Q", "J", "Z")) %>%
lapply(table) %>% do.call(what=rbind)
tbl
which gives this:
K Q J Z
[1,] 1 1 0 0
[2,] 1 1 0 0
[3,] 1 1 1 1
[4,] 1 0 1 1
Then hash and count:
h <- apply(tbl, 1, digest)
tbl <- cbind(tbl, count=as.vector(table(h)[h]))
tbl <- tbl[!duplicated(h), ]
Here's the result:
K Q J Z count
[1,] 1 1 0 0 2
[2,] 1 1 1 1 1
[3,] 1 0 1 1 1
Upvotes: 1
Reputation: 66819
Here's a longer variation on David's comment/answer:
vals <- sort(unique(unlist(strsplit(data,''))))
combos <- unlist(sapply(seq_along(vals),function(i)combn(vals,i,paste0,collapse="")))
newdata <- factor(sapply(strsplit(data,""),function(x)paste0(sort(x),collapse="")),
levels=combos)
tab <- table(newdata)
# newdata
# J K Q Z JK JQ JZ KQ KZ QZ JKQ JKZ JQZ KQZ JKQZ
# 0 0 0 0 0 0 0 2 0 0 0 1 0 0 1
tab[tab>0] # alternately
# KQ JKZ JKQZ
# 2 1 1
Upvotes: 2