Reputation: 1897
I have a table like this in R. Each record is a food cafe and a food cafe that can have many restaurants inside. I want to get the unique regulations for each food cafe at the restaurant level. The table looks like this:
mc_d burger_king pizza_hut wendys regulation_id
0 1 1 1 101
1 0 0 1 101
1 1 0 1 29
0 0 0 1 17
and this is the intended output:
restaurant | unique_regulations
--------------------------------
mc_d | 2
burger_king | 2
pizza_hut | 1
wendys | 3
I want to do this using dplyr. I tried using a case when statement like below but that doesn't actually work. For example if there is a row with both mc_d and burger_king flagged as 1, it will be labeled as mc_d, when it's actually also burger_king.
food_cafe %>%
mutate(restaurant = case_when(mc_d == 1 ~ "mc_d",
burger_king == 1 ~ "burger_king",
pizza_hut == 1 ~ "pizza_hut",
wendys == 1 ~ "wendys")) %>%
group_by(restaurant) %>%
summarise(unique_regulations = n_distinct(regulation_id))
Upvotes: 0
Views: 73
Reputation: 887851
1) We can reshape into 'long' format, filter by the binary column and then do a group_by
n_distinct
library(dplyr)
library(tidyr)
food_cafe %>%
pivot_longer(cols = -regulation_id, names_to = 'restaurant') %>%
filter(as.logical(value)) %>%
# or use
# filter(value == 1) %>%
group_by(restaurant) %>%
summarise(unique_regulations = n_distinct(regulation_id))
# A tibble: 4 x 2
# restaurant unique_regulations
# <chr> <int>
#1 burger_king 2
#2 mc_d 2
#3 pizza_hut 1
#4 wendys 3
2) Or without reshaping, we can do a group_by any
, then transpose
the dataset and do a rowSums
food_cafe %>%
group_by(regulation_id) %>%
summarise(across(everything(), any)) %>%
data.table::transpose(., make.names = 'regulation_id',
keep.names = 'restaurant') %>%
transmute(restaurant, unique_regulations = rowSums(.[-1]))
# restaurant unique_regulations
#1 mc_d 2
#2 burger_king 2
#3 pizza_hut 1
#4 wendys 3
3) Or a variation of the above
food_cafe %>%
group_by(regulation_id) %>%
summarise(across(everything(), any)) %>%
select(-1) %>%
summarise(across(everything(), sum)) %>%
pivot_longer(everything())
4) Or with colSums
and enframe
library(tibble)
food_cafe %>%
group_by(regulation_id) %>%
summarise(across(everything(), any)) %>%
select(-1) %>%
colSums %>%
enframe
5) Or using melt
(to reshape to long format) from data.table
after converting the data.frame to data.table (setDT
), then specify the logical index in i
, do a group by and count the number of distinct elements with uniqueN
library(data.table)
melt(setDT(food_cafe), id.var = 'regulation_id')[as.logical(value),
.(unique_regulation = uniqueN(regulation_id)),.(restaurant = variable)]
# restaurant unique_regulation
#1: mc_d 2
#2: burger_king 2
#3: pizza_hut 1
#4: wendys 3
6) Or using base R
, convert the 'data.frame' to table
class while setting the row names, reshape into 'long' format by converting to data.frame (as.data.frame
), subset
the rows where 'Freq' is 1, get a table
of unique
rows of subset of columns and get the rowSums
(if needed stack
it to two column data.frame)
stack(rowSums(table(unique(subset(as.data.frame.table(`row.names<-`(
as.matrix(food_cafe[-5]), food_cafe[[5]])), Freq == 1)[2:1]))))[2:1]
# ind values
#1 mc_d 2
#2 burger_king 2
#3 pizza_hut 1
#4 wendys 3
7) Or a one-liner with rowSums/aggregate
in base R
stack(colSums(aggregate(.~ regulation_id, food_cafe, any)[-1]))[2:1]
# ind values
#1 mc_d 2
#2 burger_king 2
#3 pizza_hut 1
#4 wendys 3
8) Or another base R option with by/colSums
stack(colSums(do.call(rbind, by(food_cafe[-5],
food_cafe$regulation_id, FUN = colSums)) > 0))[2:1]
# ind values
#1 mc_d 2
#2 burger_king 2
#3 pizza_hut 1
#4 wendys 3
food_cafe <- structure(list(mc_d = c(0L, 1L, 1L, 0L), burger_king = c(1L,
0L, 1L, 0L), pizza_hut = c(1L, 0L, 0L, 0L), wendys = c(1L, 1L,
1L, 1L), regulation_id = c(101L, 101L, 29L, 17L)), class = "data.frame",
row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 304
If you decide to use base R:
df <- data.frame(mc_d = c(0,1,1,0),
burger_king= c(1,0,1,0),
pizza_hut=c(1,0,0,0),
wendys=c(1,1,1,1),
regulation_id=c(101,101,29,17))
col_reg <- 5
unique_regulations <- unlist(lapply(1:4, function(x)length(unique(df[which(df[,x] == 1),col_reg]))))
restaurant <- colnames(df)[-5]
data.frame(restaurant,unique_regulations)
Upvotes: 0
Reputation: 389235
We can get the data in long format and for each name
count unique regulation_id
where value = 1
.
library(dplyr)
food_cafe %>%
tidyr::pivot_longer(cols = -regulation_id) %>%
group_by(name) %>%
summarise(unique_regulations = n_distinct(regulation_id[value == 1]))
# name unique_regulations
# <chr> <int>
#1 burger_king 2
#2 mc_d 2
#3 pizza_hut 1
#4 wendys 3
Upvotes: 1