Reputation: 303
Suppose I have a dataframe cars, and the first several lines are:
Brand Type Year
BMW Compact 2009
BMW Sedan 2010
BENZ Sedan 2010
BENZ Compact 2012
BMW Compact 2008
BENZ Sedan 2011
I want to use aggregate to first find each combination of variable "Brand" and "Type", and then find the unique number of years of each combination. For example, the desired output is like:
Brand Type num_unique_years
BMW Compact 2(which are 2009, 2008)
BMW Sedan 1(2010)
Benz Compact 1(2012)
Benz Sedan 2(2010,2011)
The steps are basically this:
x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2
However, I don't know how to combine these individual steps in one function.
Thanks for help
Upvotes: 1
Views: 266
Reputation: 675
How about using dplyr:
library(dplyr)
count(group_by(count(group_by(cars,Brand,Type, Year)),Brand,Type))
Upvotes: 0
Reputation: 270348
Just define the appropriate aggregation function and use aggregate
. No packages are used.
len_years <- function(years) {
u <- unique(sort(years))
paste0(length(u), "(", toString(u), ")")
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"
giving:
> Ag
Brand Type num_unique_years
1 BENZ Compact 1(2012)
2 BMW Compact 2(2008, 2009)
3 BENZ Sedan 2(2010, 2011)
4 BMW Sedan 1(2010)
variations
1) If you don't need the years themselves then replace the function with
len_years <- function(years) length(unique(years))
2) Alternately, replace the aggregate
statement and the next statement with:
Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)
Note: The input cars
in reproducible form is:
Lines <- "Brand Type Year
BMW Compact 2009
BMW Sedan 2010
BENZ Sedan 2010
BENZ Compact 2012
BMW Compact 2008
BENZ Sedan 2011"
cars <- read.table(text = Lines, header = TRUE)
Upvotes: 1
Reputation: 2552
I can do it in 2 steps with data.table:
library(data.table)
dt <- data.table(brand = c("BMW", "BMW", "BENZ", "BENZ", "BMW", "BENZ"),
type = c("Compact", "Sedan", "Sedan", "Compact", "Compact", "Sedan"),
year = c(2009, 2010, 2010, 2012, 2008, 2011))
dt[ , num_unique_years := length(unique(year)), by = .(brand, type)]
unique(dt[, .(type, brand, num_unique_years)])
The final result:
type brand num_unique_years
1: Compact BMW 2
2: Sedan BMW 1
3: Sedan BENZ 2
4: Compact BENZ 1
Upvotes: 1