Henkpenk
Henkpenk

Reputation: 35

get the sum of one column and average of the other with grouping

I have a dataframe which looks like this:

    Year Iteration Production Technology
    2015     1     200        Gas
    2015     1     305        Gas
    2016     1     150        Gas
    2016     1     200        Gas
    2015     2     200        Gas 

The Technology column states the type of power plants. Therefore, for each year and each iteration there can be multiple plants in the Technology column.

I would like to aggregate the data so that I have one value for the year, the mean of all the iterations, and the sum of all the power plants for that specific technology.

In this example it would be:

    Year Iteration Production Technology 
    2015 1.5       705        Gas
    2016 1.5       350        Gas

I have tried various approaches with the aggregate function, but failed due to the fact that it would also aggregate the Technology column (dividing the total production by the amount of power plants).

Upvotes: 2

Views: 60

Answers (2)

M--
M--

Reputation: 28826

Another approach in data.table:

library(data.table)

dt1[ , list(Iteration=mean(Iteration), 
            Production=sum(Production)), 
                                        by=list(Year,Technology)]

#>    Year Technology Iteration Production
#> 1: 2015        Gas  1.333333        705
#> 2: 2016        Gas  1.000000        350

As a side effect of my recent obsession with , this is a solution using sqldf package:

library(sqldf)

sqldf("select Year, Technology, 
       avg(Iteration) as AVG_Iteration, sum(Production) as TOT_Production
       from dt1 
       group by Year, Technology", drv="SQLite")

#>   Year Technology AVG_Iteration TOT_Production
#> 1 2015        Gas      1.333333            705
#> 2 2016        Gas      1.000000            350

Created on 2019-06-19 by the reprex package (v0.3.0)

Data:

dt1 <- fread(input = "  Year Iteration Production Technology
                        2015     1     200        Gas
                        2015     1     305        Gas
                        2016     1     150        Gas
                        2016     1     200        Gas
                        2015     2     200        Gas ")

Upvotes: 3

akrun
akrun

Reputation: 886938

An option would be tidyverse to group by 'Year', 'Technology', get the mean of 'Iteration' and sum of 'Production'

library(tidyverse)
df1 %>% 
     group_by(Year, Technology) %>%
     summarise(Iteration = mean(Iteration),
               Production = sum(Production))
# A tibble: 2 x 4
# Groups:   Year [2]
#   Year Technology Iteration Production
#  <int> <chr>          <dbl>      <int>
#1  2015 Gas             1.33        705
#2  2016 Gas             1           350

data

df1 <- structure(list(Year = c(2015L, 2015L, 2016L, 2016L, 2015L), Iteration = c(1L, 
1L, 1L, 1L, 2L), Production = c(200L, 305L, 150L, 200L, 200L), 
    Technology = c("Gas", "Gas", "Gas", "Gas", "Gas")), 
    class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 3

Related Questions