Reputation: 79

Creating Subset of data table in R based on another column condition

I want to create a subset of the below candyData in R in such a way that I should group the data by Brand and for each unique Brand, I should find and print the maximum of A and B. For illustration the new data should have Brand value Nestle appearing twice, the corresponding Candy Value A and B both appearing once corresponding to Nestle and their maximum values in the third column, Similarly for all brand values. Thanks and Please help.

candyData <- read.table(
text = "
Brand       Candy           value
Nestle      A               12
Nestle      B               34
Nestle      A               32
Hershey's   A               55
Hershey's   B               14
Hershey's   B               19
Mars        B               24
Nestle      B               26
Nestle      A               28
Hershey's   B               23
Hershey's   B               23
Hershey's   A               65
Mars        A               23
Mars        B               34",
header = TRUE,
stringsAsFactors = FALSE)

Upvotes: 1

Answers (5)

Matt Summersgill

Reputation: 4242

Using the provided example data and data.table:

library(data.table)
setDT(candyData)
candyData[,.(Max = max(value)), keyby = .(Brand,Candy)]

gives

       Brand Candy Max
1: Hershey's     A  65
2: Hershey's     B  23
3:      Mars     A  23
4:      Mars     B  34
5:    Nestle     A  32
6:    Nestle     B  34

Upvotes: 0

chinsoon12

Reputation: 25225

Just throwing in a few more solutions:

cd <- read.table(
    text = "
    Brand       Candy           value
    Nestle      A               12
    Nestle      B               34
    Nestle      A               32
    Hershey's   A               55
    Hershey's   B               14
    Hershey's   B               19
    Mars        B               24
    Nestle      B               26
    Nestle      A               28
    Hershey's   B               23
    Hershey's   B               23
    Hershey's   A               65
    Mars        A               23
    Mars        B               34",
    header = TRUE,
    stringsAsFactors = FALSE)

#using split + lapply or equivalently, by
c(by(cd$value, paste(cd$Brand, cd$Candy), max))

#using tapply i.e. apply to each group
tapply(cd$value, paste(cd$Brand, cd$Candy), max)

#using data.table
library(data.table)
setDT(cd)[, .(Max=max(value)), by=.(Brand, Candy)]

#using sqldf
library(sqldf)
sqldf("select Brand, Candy, max(value) as Max from cd group by Brand, Candy")

Upvotes: 1

Jordi

Reputation: 1343

aggregate(value ~ ., candyData, max)

This groups candyData by Brand and Candy (since they are all columns other than value; . does this) and gives the max of value for each group.

Upvotes: 2

Tobias

Reputation: 564

Although my answer is by far not as elegant as the one using dplyr, I created a solution using base R.

splittedData <- split(candyData,candyData$Brand)
resultDf <- data.frame(matrix(ncol = 3))
colnames(resultDf) <- c("Brand", "Candy", "maxValue")
insertIndex<-1
for(dfIndex in 1:length(splittedData)) {
  tempDf <- splittedData[[dfIndex]]
  tableDf <- data.frame(table(tempDf$Candy))
  tableDf[,1] <- as.character(tableDf[,1])
  for(i in 1:nrow(tableDf)) {
    resultDf[insertIndex, 1] <- tempDf$Brand[1]
    resultDf[insertIndex, 2] <- tableDf[i,1]
    resultDf[insertIndex, 3] <- max(tempDf$value[tempDf$Candy==tableDf[i,1]])
    insertIndex <- insertIndex + 1
  }
}

Output is a new df:

  Brand     Candy maxValue
1 Hershey's     A       65
2 Hershey's     B       23
3      Mars     A       23
4      Mars     B       34
5    Nestle     A       32
6    Nestle     B       34

Upvotes: 0

Stephan

Reputation: 2246

try this:

library(dplyr)
candyData %>% 
  group_by(Brand, Candy) %>% 
  summarise(max=max(value))

output will be:

# A tibble: 6 x 3
# Groups:   Brand [?]
  Brand     Candy   max
  <chr>     <chr> <dbl>
1 Hershey's A       65.
2 Hershey's B       23.
3 Mars      A       23.
4 Mars      B       34.
5 Nestle    A       32.
6 Nestle    B       34.

Upvotes: 2

Creating Subset of data table in R based on another column condition

Answers (5)

Related Questions