Parker
Parker

Reputation: 303

R - using aggregate on several variables and combine unique values

Suppose I have a dataframe cars, and the first several lines are:

Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011

I want to use aggregate to first find each combination of variable "Brand" and "Type", and then find the unique number of years of each combination. For example, the desired output is like:

Brand        Type         num_unique_years
BMW          Compact      2(which are 2009, 2008)
BMW          Sedan        1(2010)
Benz         Compact      1(2012)
Benz         Sedan        2(2010,2011)

The steps are basically this:

x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2

However, I don't know how to combine these individual steps in one function.

Thanks for help

Upvotes: 1

Views: 266

Answers (3)

edgararuiz
edgararuiz

Reputation: 675

How about using dplyr:

library(dplyr) count(group_by(count(group_by(cars,Brand,Type, Year)),Brand,Type))

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270348

Just define the appropriate aggregation function and use aggregate. No packages are used.

len_years <- function(years) {
  u <- unique(sort(years))
  paste0(length(u), "(", toString(u), ")")
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"

giving:

> Ag
  Brand    Type num_unique_years
1  BENZ Compact          1(2012)
2   BMW Compact    2(2008, 2009)
3  BENZ   Sedan    2(2010, 2011)
4   BMW   Sedan          1(2010)

variations

1) If you don't need the years themselves then replace the function with

len_years <- function(years) length(unique(years))

2) Alternately, replace the aggregate statement and the next statement with:

Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)

Note: The input cars in reproducible form is:

Lines <- "Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011"
cars <- read.table(text = Lines, header = TRUE)

Upvotes: 1

BLT
BLT

Reputation: 2552

I can do it in 2 steps with data.table:

library(data.table)
dt <- data.table(brand = c("BMW", "BMW", "BENZ", "BENZ", "BMW", "BENZ"),
                 type = c("Compact", "Sedan", "Sedan", "Compact", "Compact", "Sedan"),
                 year = c(2009, 2010, 2010, 2012, 2008, 2011))


dt[ , num_unique_years := length(unique(year)), by = .(brand, type)]
unique(dt[, .(type, brand, num_unique_years)])

The final result:

      type brand num_unique_years
1: Compact   BMW                2
2:   Sedan   BMW                1
3:   Sedan  BENZ                2
4: Compact  BENZ                1

Upvotes: 1

Related Questions