Reputation: 33
I am trying to find all actual combinations within my data of values in column 1.
I then want to count all occurrences of these by column 2.
It feels like R should be able to do this fairly quickly. I tried reading up on combn and expand.grid, but with no success. The main problem was I could not find any guidance on how to generate combinations within a column.
My data looks like:
Animal (n=57) | Person ID (n=1000)
Dog | 0001
Cat | 0004
Bird | 0001
Snake | 0002
Spider | 0002
Cat | 0003
Dog | 0004
Expected output is:
AnimalComb | CountbyID
Cat | 1
DogBird | 1
SnakeSpider | 1
CatDog | 1
EDIT deleted an erroneous entry for cat
Upvotes: 3
Views: 980
Reputation: 887851
An option using data.table
library(data.table)
setDT(df)[, .(AnimalComb = toString(unique(Animal)),
CountbyID = .N/uniqueN(Animal)), by = PersonID]
df <- structure(list(Animal = c("Dog", "Cat", "Bird", "Snake", "Spider",
"Cat", "Dog"), PersonID = c(1L, 4L, 1L, 2L, 2L, 3L, 4L)),
class = "data.frame", row.names = c(NA, -7L))
Upvotes: 0
Reputation: 389235
If I have understood you correctly, you need to group_by
PersonID
and paste
the all the unique
Animal
s in the group and count the number of occurrence of their combination which can be done counting the number of rows in the group (n()
) and dividing it by number of distinct values (n_distinct
).
library(dplyr)
df %>%
group_by(PersonID) %>%
summarise(AnimalComb = paste(unique(Animal), collapse = ""),
CountbyID = n() / n_distinct(Animal))
# PersonID AnimalComb CountbyID
# <int> <chr> <dbl>
#1 1 DogBird 1
#2 2 SnakeSpider 1
#3 3 Cat 1
#4 4 CatDog 1
Upvotes: 5