Reputation: 137
This question is a variant on the question I asked (very) recently here. (Sorry to ask two similar questions. I realised after asking that the question I asked wasn't quite right, but I thought I'd leave the original in case it's of use to people in future, and keep this question separate.)
I have a set of data that looks like this, modified slightly from the previous question:
Category Item Shop1 Shop2 Shop3
1 Fruit Apples 4 6 0
2 Fruit Oranges 0 2 7
3 Veg Potatoes 0 1 0
4 Veg Onions 0 2 8
5 Veg Carrots 0 1 3
6 Dairy Yoghurt 1 5 9
7 Dairy Milk 0 1 0
8 Dairy Cheese 0 0 7
I want to filter my data so that I only have Categories that are sold by all shops - if a shop doesn't have any sales for a whole Category, then I'd like to filter it out. In this example, the Veg category would be filtered out because Shop1 has no Veg sales.
In an attempt to resolve this, I tried changing the suggestions I was given for my previous question from using FUN = any
to FUN = all
but that didn't work and threw errors each time, and I'm unsure what else to try.
I'd appreciate any help you can offer.
Upvotes: 0
Views: 1961
Reputation: 33498
One more solution using data.table
using two steps.
# Data
dt <- data.table(Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
Item = c("Apples", "Oranges", "Potatoes", "Onions",
"Carrots", "Yoghurt", "Milk", "Cheese"),
Shop1 = c(4, rep(0, 4), 1, rep(0, 2)),
Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0),
Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7))
filt <- dt[, any(sum(Shop1) == 0, sum(Shop2) == 0, sum(Shop3) == 0),
by = Category]
filt
Category V1
1: Fruit FALSE
2: Veg TRUE
3: Dairy FALSE
dt[Category %in% filt[V1 == FALSE, Category]]
Category Item Shop1 Shop2 Shop3
1: Fruit Apples 4 6 0
2: Fruit Oranges 0 2 7
3: Dairy Yoghurt 1 5 9
4: Dairy Milk 0 1 0
5: Dairy Cheese 0 0 7
Upvotes: 0
Reputation: 2436
Here is an example using dplyr
. You first group_by
the Category variable, and only keep records that have more than 0 sale.
library(tidyverse)
d <- data_frame(
Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
Item = c("Apples", "Oranges", "Potatoes", "Onions", "Carrots", "Yoghurt", "Milk", "Cheese"),
Shop1 = c(4, rep(0, 4), 1, rep(0, 2)),
Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0),
Shop3 = c(0, 7, 0, 8, 3, 9, 0, 7)
)
d %>%
group_by(Category) %>%
filter(sum(Shop1) > 0 & sum(Shop2) > 0 & sum(Shop3) > 0) %>%
ungroup()
Upvotes: 3
Reputation: 4482
Here is a solution using library(data.table)
dt <- data.table(category=c("Fruit","Fruit","Veg","Veg","Veg","Dairy","Dairy","Dairy"),
item=c("apples","oranges","potatoes","onions","carrots","yoghurt","milk","cheese"),
shop1=c(4,0,0,0,0,1,0,0),
shop2=c(6,2,1,2,1,5,1,0),
shop3=c(0,7,0,8,3,9,0,7))
dt_m <- melt(dt,id.vars = c("category","item"))
dt_m[,counts:=sum(value),by=.(category,variable)]
dt_m <- dt_m[counts>0]
dt_m[,counts:=NULL]
dt <- dcast.data.table(dt_m,category+item~variable,value.var = "value")
dt <- na.omit(dt)
Or using dplyr
dt %>% melt(id.vars = c("category","item")) %>% group_by(category,variable) %>%
mutate(counts=sum(value)) %>% filter(counts>0) %>% mutate(counts=NULL) %>%
dcast(category+item~variable,value.var = "value") %>% na.omit()
Upvotes: 1
Reputation: 51582
Here is one idea with colSums
,
ind <- colSums(sapply(split(df[3:5], df$Category), function(i) colSums(i) == 0)) == 0
df[df$Category %in% names(ind)[ind],]
Which gives,
Category Item Shop1 Shop2 Shop3 1 Fruit Apples 4 6 0 2 Fruit Oranges 0 2 7 6 Dairy Yoghurt 1 5 9 7 Dairy Milk 0 1 0 8 Dairy Cheese 0 0 7
Upvotes: 4