Reputation: 137

Subsetting rows in R, and filtering out rows if column has all-zero values for subset

This question is a variant on the question I asked (very) recently here. (Sorry to ask two similar questions. I realised after asking that the question I asked wasn't quite right, but I thought I'd leave the original in case it's of use to people in future, and keep this question separate.)

I have a set of data that looks like this, modified slightly from the previous question:

  Category     Item Shop1 Shop2 Shop3
1    Fruit   Apples     4     6     0
2    Fruit  Oranges     0     2     7
3      Veg Potatoes     0     1     0
4      Veg   Onions     0     2     8
5      Veg  Carrots     0     1     3
6    Dairy  Yoghurt     1     5     9
7    Dairy     Milk     0     1     0
8    Dairy   Cheese     0     0     7

I want to filter my data so that I only have Categories that are sold by all shops - if a shop doesn't have any sales for a whole Category, then I'd like to filter it out. In this example, the Veg category would be filtered out because Shop1 has no Veg sales.

In an attempt to resolve this, I tried changing the suggestions I was given for my previous question from using FUN = any to FUN = all but that didn't work and threw errors each time, and I'm unsure what else to try.

I'd appreciate any help you can offer.

Upvotes: 0

Answers (4)

s_baldur

Reputation: 33743

One more solution using data.table using two steps.

# Data
dt <- data.table(Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
                 Item     = c("Apples", "Oranges", "Potatoes", "Onions", 
                              "Carrots", "Yoghurt", "Milk", "Cheese"),
                 Shop1    = c(4, rep(0, 4), 1, rep(0, 2)),
                 Shop2    = c(6, 2, 1, 2, 1, 5, 1, 0),
                 Shop3    = c(0, 7, 0, 8, 3, 9, 0, 7))

filt <- dt[, any(sum(Shop1) == 0, sum(Shop2) == 0, sum(Shop3) == 0), 
           by = Category]
filt
       Category    V1
1:    Fruit FALSE
2:      Veg  TRUE
3:    Dairy FALSE

dt[Category %in% filt[V1 == FALSE, Category]]

   Category    Item Shop1 Shop2 Shop3
1:    Fruit  Apples     4     6     0
2:    Fruit Oranges     0     2     7
3:    Dairy Yoghurt     1     5     9
4:    Dairy    Milk     0     1     0
5:    Dairy  Cheese     0     0     7

Upvotes: 0

fmic_

Reputation: 2446

Here is an example using dplyr. You first group_by the Category variable, and only keep records that have more than 0 sale.

library(tidyverse)
d <- data_frame(
    Category = c(rep("Fruit", 2), rep("Veg", 3), rep("Dairy", 3)),
    Item = c("Apples", "Oranges", "Potatoes", "Onions", "Carrots", "Yoghurt", "Milk", "Cheese"),
    Shop1 = c(4, rep(0, 4), 1, rep(0, 2)),
    Shop2 = c(6, 2, 1, 2, 1, 5, 1, 0),
    Shop3 = c(0, 7, 0, 8,  3, 9, 0, 7)
)

d %>%
    group_by(Category) %>%
    filter(sum(Shop1)  > 0 & sum(Shop2) > 0 &  sum(Shop3) > 0) %>%
    ungroup()

Upvotes: 3

quant

Reputation: 4492

Here is a solution using library(data.table)

dt <- data.table(category=c("Fruit","Fruit","Veg","Veg","Veg","Dairy","Dairy","Dairy"),
                            item=c("apples","oranges","potatoes","onions","carrots","yoghurt","milk","cheese"),
                            shop1=c(4,0,0,0,0,1,0,0),
                            shop2=c(6,2,1,2,1,5,1,0),
                            shop3=c(0,7,0,8,3,9,0,7))
dt_m <- melt(dt,id.vars = c("category","item"))
dt_m[,counts:=sum(value),by=.(category,variable)]
dt_m <- dt_m[counts>0]
dt_m[,counts:=NULL]
dt <- dcast.data.table(dt_m,category+item~variable,value.var = "value")
dt <- na.omit(dt)

Or using dplyr

dt %>% melt(id.vars = c("category","item")) %>% group_by(category,variable) %>%
  mutate(counts=sum(value)) %>% filter(counts>0) %>% mutate(counts=NULL) %>% 
  dcast(category+item~variable,value.var = "value") %>% na.omit()

Upvotes: 1

Sotos

Reputation: 51602

Here is one idea with colSums,

ind <- colSums(sapply(split(df[3:5], df$Category), function(i) colSums(i) == 0)) == 0
df[df$Category %in% names(ind)[ind],]

Which gives,

 Category    Item Shop1 Shop2 Shop3
1    Fruit  Apples     4     6     0
2    Fruit Oranges     0     2     7
6    Dairy Yoghurt     1     5     9
7    Dairy    Milk     0     1     0
8    Dairy  Cheese     0     0     7

Upvotes: 4

Subsetting rows in R, and filtering out rows if column has all-zero values for subset

Answers (4)

Related Questions