joy_1379
joy_1379

Reputation: 499

Count number of unique values in two columns by group

I have a data frame with IDs for web page ('Webpage'), department ('Dept') and employee ('Emp_ID'):

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_ID = c(1, 1, 2, 3, 4, 4)) 

#   Webpage Dept Emp_ID
# 1     111  101      1
# 2     111  101      1
# 3     111  101      2
# 4     111  102      3
# 5     222  102      4
# 6     222  103      4

I want to know how many unique individual has seen the different webpages.

enter image description here

For e.g. in the following dataset webpage 111 has been seen by three individual (unique combination of Dept and emp ID). So webpage 111 has been seen by emp_ID 1,2 and 3 in Dept 101 and 102. Similarly webpage 222 has been seen by two different individual.

My first attempt is:

nrow(unique(data[ , c("Dept", "Emp_ID)]))  

Using unique I can do for one web page, but can someone please suggest how I can calculate this for all web pages

Upvotes: 0

Views: 54

Answers (3)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4))
library(dplyr)

df %>% 
  group_by(Webpage) %>% 
  summarise(n = n_distinct(Dept, Emp_Id))
#> # A tibble: 2 x 2
#>   Webpage     n
#>     <dbl> <int>
#> 1     111     3
#> 2     222     2

library(data.table)
setDT(df)[, list(n = uniqueN(paste0(Dept, Emp_Id))), by = Webpage]
#>    Webpage n
#> 1:     111 3
#> 2:     222 2

Created on 2021-03-30 by the reprex package (v1.0.0)

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

Hope aggregate can help

> aggregate(cbind(n_viewer = Emp_Id) ~ Webpage, unique(df), length)
  Webpage n_viewer
1     111        3
2     222        2

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

For each Webpage count unique number based on two columns using duplicated.

library(dplyr)

df %>%
  group_by(Webpage) %>%
  summarise(n_viewers = sum(!duplicated(cur_data())))

#  Webpage n_viewers
#    <dbl>     <int>
#1     111         3
#2     222         2

data

Provide data in a reproducible format which is easier to copy rather than an image.

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4))

Upvotes: 2

Related Questions