Reputation: 407
Suppose I have the data frame:
df <- data.frame(Year = rep(1:3, each = 5)
, Terminal = c(1,1,1,1,1,1,2,2,2,2,2,2,2,1,2)
, day = c (1,1,1,1,1,1,2,2,2,2,2,2,2,1,2)
, Capacity = sample(1:15))
and trying to get a columnb "X" that is a sum of capacity for the same year, day and terminal.
Original df
Outcome:
I use below codes to do the calculations:
aggregate(Capacity ~ Terminal + Year + day , data=df, FUN=sum)
and
as.data.table(df)[, sum(Capacity), by = .(Terminal, Year, day)]
but when I try to create the new column it only prints either 1 or 2 value and not the sum. Also it gives the below warring. The code I have for the X is df["X"] <- aggregate(Capacity ~ Terminal + Year + day , data=df, FUN=sum)
Warning message:
In [<-.data.frame
(*tmp*
, "X", value = list(Terminal = c(1, 1, :
provided 4 variables to replace 1 variables
Upvotes: 0
Views: 45
Reputation: 887951
The aggregate
returns a summarised output and not create a new column. We can use mutate
from dplyr
library(dplyr)
df %>%
group_by(Year, day, Terminal) %>%
mutate(X = sum(Capacity))
For the data.table
approach we need to assign :=
to create a new column
as.data.table(df)[, X := sum(Capacity), by = .(Terminal, Year, day)]
Or with ave
from base R
df$X <- with(df, ave(Capacity, Year, day, Terminal, FUN = sum))
Upvotes: 2