Taha Abrar
Taha Abrar

Reputation: 33

How to find all combinations in column and count occurrences in data

I am trying to find all actual combinations within my data of values in column 1.

I then want to count all occurrences of these by column 2.

It feels like R should be able to do this fairly quickly. I tried reading up on combn and expand.grid, but with no success. The main problem was I could not find any guidance on how to generate combinations within a column.

My data looks like:

Animal (n=57) | Person ID (n=1000)
Dog     | 0001
Cat     | 0004
Bird    | 0001
Snake   | 0002 
Spider  | 0002
Cat     | 0003
Dog     | 0004

Expected output is:

AnimalComb | CountbyID

Cat         | 1
DogBird     | 1
SnakeSpider | 1
CatDog      | 1

EDIT deleted an erroneous entry for cat

Upvotes: 3

Views: 980

Answers (2)

akrun
akrun

Reputation: 887851

An option using data.table

library(data.table)
setDT(df)[,  .(AnimalComb = toString(unique(Animal)),
      CountbyID = .N/uniqueN(Animal)), by = PersonID]

data

df <- structure(list(Animal = c("Dog", "Cat", "Bird", "Snake", "Spider", 
"Cat", "Dog"), PersonID = c(1L, 4L, 1L, 2L, 2L, 3L, 4L)),
 class = "data.frame", row.names = c(NA, -7L))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389235

If I have understood you correctly, you need to group_by PersonID and paste the all the unique Animals in the group and count the number of occurrence of their combination which can be done counting the number of rows in the group (n()) and dividing it by number of distinct values (n_distinct).

library(dplyr)

df %>%
  group_by(PersonID) %>%
  summarise(AnimalComb = paste(unique(Animal), collapse = ""), 
            CountbyID = n() / n_distinct(Animal)) 

#  PersonID AnimalComb  CountbyID
#     <int> <chr>           <dbl>
#1        1 DogBird             1
#2        2 SnakeSpider         1
#3        3 Cat                 1
#4        4 CatDog              1

Upvotes: 5

Related Questions