Reputation: 79
I want to create a subset of the below candyData in R in such a way that I should group the data by Brand and for each unique Brand, I should find and print the maximum of A and B. For illustration the new data should have Brand value Nestle appearing twice, the corresponding Candy Value A and B both appearing once corresponding to Nestle and their maximum values in the third column, Similarly for all brand values. Thanks and Please help.
candyData <- read.table(
text = "
Brand Candy value
Nestle A 12
Nestle B 34
Nestle A 32
Hershey's A 55
Hershey's B 14
Hershey's B 19
Mars B 24
Nestle B 26
Nestle A 28
Hershey's B 23
Hershey's B 23
Hershey's A 65
Mars A 23
Mars B 34",
header = TRUE,
stringsAsFactors = FALSE)
Upvotes: 1
Views: 799
Reputation: 4242
Using the provided example data and data.table
:
library(data.table)
setDT(candyData)
candyData[,.(Max = max(value)), keyby = .(Brand,Candy)]
gives
Brand Candy Max
1: Hershey's A 65
2: Hershey's B 23
3: Mars A 23
4: Mars B 34
5: Nestle A 32
6: Nestle B 34
Upvotes: 0
Reputation: 25225
Just throwing in a few more solutions:
cd <- read.table(
text = "
Brand Candy value
Nestle A 12
Nestle B 34
Nestle A 32
Hershey's A 55
Hershey's B 14
Hershey's B 19
Mars B 24
Nestle B 26
Nestle A 28
Hershey's B 23
Hershey's B 23
Hershey's A 65
Mars A 23
Mars B 34",
header = TRUE,
stringsAsFactors = FALSE)
#using split + lapply or equivalently, by
c(by(cd$value, paste(cd$Brand, cd$Candy), max))
#using tapply i.e. apply to each group
tapply(cd$value, paste(cd$Brand, cd$Candy), max)
#using data.table
library(data.table)
setDT(cd)[, .(Max=max(value)), by=.(Brand, Candy)]
#using sqldf
library(sqldf)
sqldf("select Brand, Candy, max(value) as Max from cd group by Brand, Candy")
Upvotes: 1
Reputation: 1343
aggregate(value ~ ., candyData, max)
This groups candyData
by Brand
and Candy
(since they are all columns other than value
; .
does this) and gives the max
of value
for each group.
Upvotes: 2
Reputation: 564
Although my answer is by far not as elegant as the one using dplyr
, I created a solution using base R.
splittedData <- split(candyData,candyData$Brand)
resultDf <- data.frame(matrix(ncol = 3))
colnames(resultDf) <- c("Brand", "Candy", "maxValue")
insertIndex<-1
for(dfIndex in 1:length(splittedData)) {
tempDf <- splittedData[[dfIndex]]
tableDf <- data.frame(table(tempDf$Candy))
tableDf[,1] <- as.character(tableDf[,1])
for(i in 1:nrow(tableDf)) {
resultDf[insertIndex, 1] <- tempDf$Brand[1]
resultDf[insertIndex, 2] <- tableDf[i,1]
resultDf[insertIndex, 3] <- max(tempDf$value[tempDf$Candy==tableDf[i,1]])
insertIndex <- insertIndex + 1
}
}
Output is a new df:
Brand Candy maxValue
1 Hershey's A 65
2 Hershey's B 23
3 Mars A 23
4 Mars B 34
5 Nestle A 32
6 Nestle B 34
Upvotes: 0
Reputation: 2246
try this:
library(dplyr)
candyData %>%
group_by(Brand, Candy) %>%
summarise(max=max(value))
output will be:
# A tibble: 6 x 3
# Groups: Brand [?]
Brand Candy max
<chr> <chr> <dbl>
1 Hershey's A 65.
2 Hershey's B 23.
3 Mars A 23.
4 Mars B 34.
5 Nestle A 32.
6 Nestle B 34.
Upvotes: 2