Reputation: 35
I have a dataframe which looks like this:
Year Iteration Production Technology
2015 1 200 Gas
2015 1 305 Gas
2016 1 150 Gas
2016 1 200 Gas
2015 2 200 Gas
The Technology column states the type of power plants. Therefore, for each year and each iteration there can be multiple plants in the Technology column.
I would like to aggregate the data so that I have one value for the year, the mean of all the iterations, and the sum of all the power plants for that specific technology.
In this example it would be:
Year Iteration Production Technology
2015 1.5 705 Gas
2016 1.5 350 Gas
I have tried various approaches with the aggregate function, but failed due to the fact that it would also aggregate the Technology column (dividing the total production by the amount of power plants).
Upvotes: 2
Views: 60
Reputation: 28826
Another approach in data.table
:
library(data.table)
dt1[ , list(Iteration=mean(Iteration),
Production=sum(Production)),
by=list(Year,Technology)]
#> Year Technology Iteration Production
#> 1: 2015 Gas 1.333333 705
#> 2: 2016 Gas 1.000000 350
As a side effect of my recent obsession with sql, this is a solution using sqldf
package:
library(sqldf)
sqldf("select Year, Technology,
avg(Iteration) as AVG_Iteration, sum(Production) as TOT_Production
from dt1
group by Year, Technology", drv="SQLite")
#> Year Technology AVG_Iteration TOT_Production
#> 1 2015 Gas 1.333333 705
#> 2 2016 Gas 1.000000 350
Created on 2019-06-19 by the reprex package (v0.3.0)
Data:
dt1 <- fread(input = " Year Iteration Production Technology
2015 1 200 Gas
2015 1 305 Gas
2016 1 150 Gas
2016 1 200 Gas
2015 2 200 Gas ")
Upvotes: 3
Reputation: 886938
An option would be tidyverse
to group by 'Year', 'Technology', get the mean
of 'Iteration' and sum
of 'Production'
library(tidyverse)
df1 %>%
group_by(Year, Technology) %>%
summarise(Iteration = mean(Iteration),
Production = sum(Production))
# A tibble: 2 x 4
# Groups: Year [2]
# Year Technology Iteration Production
# <int> <chr> <dbl> <int>
#1 2015 Gas 1.33 705
#2 2016 Gas 1 350
df1 <- structure(list(Year = c(2015L, 2015L, 2016L, 2016L, 2015L), Iteration = c(1L,
1L, 1L, 1L, 2L), Production = c(200L, 305L, 150L, 200L, 200L),
Technology = c("Gas", "Gas", "Gas", "Gas", "Gas")),
class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 3