Query and aggregate data based on conditions in R

Question

I have a data frame and I want to get the mean of all values of type b for each year, if type a have values equal to 1.

Year  type   value1   value2  value3  value4  value5
1     a       1        1        2       3       4
1     b       10       12       9       8       10
2     a       1        2        2       2       1
2     b       11       10       13      9       14

so that my final product looks like this:

Year  type_b_values
1      11
2      12.5

which are the averages of value1 and value2 for Year1 and average of value1 and 5 for Year2. Thanks!

Tyler Rinker · Accepted Answer

Here is an approach using base functions. I'm guessing plyr or reshape may be useful packages here as well but I'm much less familiar with them:

dat <- read.table(text="Year  type   value1   value2  value3  value4  value5
1     a       1        1        2       3       4
1     b       10       12       9       8       10
2     a       1        2        2       2       1
2     b       11       10       13      9       14", header=TRUE)


dat_split <- split(dat, dat$Year)       # split our data into a list by year

output <- sapply(dat_split, function(x) {
    y <- x[x$type == "a", -c(1:2)] == 1 # which a in that year = 1
    z <- x[x$type == "b", -c(1:2)][y]   # grab the b values that a = 1
    if (sum(y) == 0) {                  # eliminate if no a = 1
        return(NA)
    }
    mean(z)
})

data.frame(Year = names(output), type_b_values = output)

## > data.frame(Year = names(output), type_b_values = output)
##   Year type_b_values
## 1    1          11.0
## 2    2          12.5

Query and aggregate data based on conditions in R

Answers (2)

Related Questions