Eisen
Eisen

Reputation: 1897

Getting Unique values in a group By in R with flagged columns

I have a table like this in R. Each record is a food cafe and a food cafe that can have many restaurants inside. I want to get the unique regulations for each food cafe at the restaurant level. The table looks like this:

mc_d burger_king pizza_hut  wendys  regulation_id
  0       1        1           1        101
  1       0        0           1        101
  1       1        0           1         29
  0       0        0           1         17
  

and this is the intended output:

restaurant  | unique_regulations
--------------------------------
mc_d        |    2
burger_king |    2
pizza_hut   |    1
wendys      |    3

  

I want to do this using dplyr. I tried using a case when statement like below but that doesn't actually work. For example if there is a row with both mc_d and burger_king flagged as 1, it will be labeled as mc_d, when it's actually also burger_king.

food_cafe %>%
          mutate(restaurant = case_when(mc_d == 1 ~ "mc_d",
                                  burger_king == 1 ~ "burger_king",           
                                  pizza_hut == 1 ~ "pizza_hut",
                                  wendys == 1 ~ "wendys")) %>%
          group_by(restaurant) %>%
          summarise(unique_regulations = n_distinct(regulation_id))
         

Upvotes: 0

Views: 73

Answers (3)

akrun
akrun

Reputation: 887851

1) We can reshape into 'long' format, filter by the binary column and then do a group_by n_distinct

library(dplyr)
library(tidyr)
food_cafe %>% 
     pivot_longer(cols = -regulation_id, names_to = 'restaurant') %>% 
     filter(as.logical(value)) %>%
     # or use
     # filter(value == 1) %>%
    group_by(restaurant) %>% 
    summarise(unique_regulations = n_distinct(regulation_id))
# A tibble: 4 x 2
#  restaurant unique_regulations
#  <chr>       <int>
#1 burger_king     2
#2 mc_d            2
#3 pizza_hut       1
#4 wendys          3

2) Or without reshaping, we can do a group_by any, then transpose the dataset and do a rowSums

food_cafe %>% 
   group_by(regulation_id) %>% 
   summarise(across(everything(), any))  %>%
   data.table::transpose(., make.names = 'regulation_id',
          keep.names = 'restaurant') %>%
   transmute(restaurant, unique_regulations = rowSums(.[-1]))
#    restaurant unique_regulations
#1        mc_d                  2
#2 burger_king                  2
#3   pizza_hut                  1
#4      wendys                  3

3) Or a variation of the above

food_cafe %>%
    group_by(regulation_id) %>% 
    summarise(across(everything(), any)) %>%
    select(-1) %>% 
    summarise(across(everything(), sum)) %>% 
    pivot_longer(everything())

4) Or with colSums and enframe

library(tibble)
food_cafe %>% 
    group_by(regulation_id) %>%
    summarise(across(everything(), any)) %>% 
    select(-1) %>%
    colSums %>%
    enframe

5) Or using melt (to reshape to long format) from data.table after converting the data.frame to data.table (setDT), then specify the logical index in i, do a group by and count the number of distinct elements with uniqueN

library(data.table)
melt(setDT(food_cafe), id.var = 'regulation_id')[as.logical(value), 
    .(unique_regulation = uniqueN(regulation_id)),.(restaurant = variable)]
#    restaurant unique_regulation
#1:        mc_d                 2
#2: burger_king                 2
#3:   pizza_hut                 1
#4:      wendys                 3

6) Or using base R, convert the 'data.frame' to table class while setting the row names, reshape into 'long' format by converting to data.frame (as.data.frame), subset the rows where 'Freq' is 1, get a table of unique rows of subset of columns and get the rowSums (if needed stack it to two column data.frame)

stack(rowSums(table(unique(subset(as.data.frame.table(`row.names<-`(
       as.matrix(food_cafe[-5]), food_cafe[[5]])), Freq == 1)[2:1]))))[2:1]
#          ind values
#1        mc_d      2
#2 burger_king      2
#3   pizza_hut      1
#4      wendys      3

7) Or a one-liner with rowSums/aggregate in base R

stack(colSums(aggregate(.~ regulation_id, food_cafe, any)[-1]))[2:1]
#          ind values
#1        mc_d      2
#2 burger_king      2
#3   pizza_hut      1
#4      wendys      3

8) Or another base R option with by/colSums

stack(colSums(do.call(rbind, by(food_cafe[-5], 
          food_cafe$regulation_id, FUN = colSums)) > 0))[2:1]
#          ind values
#1        mc_d      2
#2 burger_king      2
#3   pizza_hut      1
#4      wendys      3

data

food_cafe <- structure(list(mc_d = c(0L, 1L, 1L, 0L), burger_king = c(1L, 
0L, 1L, 0L), pizza_hut = c(1L, 0L, 0L, 0L), wendys = c(1L, 1L, 
1L, 1L), regulation_id = c(101L, 101L, 29L, 17L)), class = "data.frame", 
     row.names = c(NA, 
-4L))

Upvotes: 1

Ivalbert Pereira
Ivalbert Pereira

Reputation: 304

If you decide to use base R:

df <- data.frame(mc_d = c(0,1,1,0),
             burger_king= c(1,0,1,0),
             pizza_hut=c(1,0,0,0),
             wendys=c(1,1,1,1),
             regulation_id=c(101,101,29,17))

col_reg <- 5
unique_regulations <- unlist(lapply(1:4, function(x)length(unique(df[which(df[,x] == 1),col_reg]))))
restaurant <- colnames(df)[-5]
data.frame(restaurant,unique_regulations)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389235

We can get the data in long format and for each name count unique regulation_id where value = 1.

library(dplyr)

food_cafe %>%
  tidyr::pivot_longer(cols = -regulation_id) %>%
  group_by(name) %>%
  summarise(unique_regulations = n_distinct(regulation_id[value == 1]))

#  name        unique_regulations
#  <chr>                    <int>
#1 burger_king                  2
#2 mc_d                         2
#3 pizza_hut                    1
#4 wendys                       3

Upvotes: 1

Related Questions