user13069688
user13069688

Reputation: 353

Calculate count and proportion of unique observations based on two variables

I have the following data:

data <- structure(list(class = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 
                             3, 3, 3, 3, 3, 1, 1, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 
                             5), ID = c(700, 700, 800, 800, 800, 300, 300, 300, 300, 555, 
                                        555, 555, 555, 555, 555, 555, 555, 700, 700, 900, 900, 800, 300, 
                                        300, 300, 300, 555, 555, 555, 555, 555, 555, 555, 555), type = c(1, 
                                                                                                         1, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
                                                                                                         3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1), date = structure(c(1610668800, 
                                                                                                                                                                 1610668800, 1610668800, 1610668800, 1610668800, 1610668800, 1610668800, 
                                                                                                                                                                 1610668800, 1610668800, 1610668800, 1610668800, 1610668800, 1610668800, 
                                                                                                                                                                 1610668800, 1610668800, 1610668800, 1610668800, 1610841600, 1610841600, 
                                                                                                                                                                 1610841600, 1610841600, 1610841600, 1610841600, 1610841600, 1610841600, 
                                                                                                                                                                 1610841600, 1610841600, 1610841600, 1610841600, 1610841600, 1610841600, 
                                                                                                                                                                 1610841600, 1610841600, 1610841600), class = c("POSIXct", "POSIXt"
                                                                                                                                                                 ), tzone = "UTC")), row.names = c(NA, -34L), class = c("tbl_df", 
                                                                                                                                                                                                                        "tbl", "data.frame"))

What I would like to do is to calculate the count / unique times of the ID column per date and per class and then calculate each type of 1,2 and 3. So for example, although the ID 700 appears 2 times on the 2021-01-15 I would like to contribute one time in the percentage.

I have tried the following with different variations with no success:

data_perc <-  data %>%                                                                                                                                                                                                                         
 tabyl(class, type)

So my results should look something like the following:

class     date     type1   type2   type3
  1    2021-01-15   30%     30%     40%
  1    2021-01-17   33%     33%     34%
  2    2021-01-15   60%     20%     20%   

Thank you in advance :)

Upvotes: 0

Views: 540

Answers (1)

Quixotic22
Quixotic22

Reputation: 2924

Not sure if I'm misunderstanding the question or the expected results isn't from the sample provided but anyway this is a solution to what I think you want.

data %>%
  group_by(class, date, type) %>%
  summarise(n = n_distinct(ID)) %>%
  group_by(class, date) %>% 
  mutate(
    n = n/sum(n),
    type = paste0("type", type),
    n = scales::percent(n) #optional row if want formatted as percent string
  ) %>%
  pivot_wider(
    names_from = type,
    values_from = n,
    values_fill = NA
  )

Upvotes: 1

Related Questions