Summing rows grouped by another parameter in R

Question

I am trying to calculate some rates for time on condition parameters, and have written the following, which successfully calculates the desired rates. But, I'm sure there must be a more succinct way to do this using the data.table methods. Any suggestions?

Background on what I'm trying to achieve with the code. For each run number there are 10 record numbers. Each record number refers to a value bin (the full range of values for each parameter is split into 10 equal sized bins). The values are counts of time spent in each bin. I am trying to sum the counts for P1 over each run number (calling this opHours for the run number). I then want to divide each of the bin counts by the opHours to show the proportion of each run that is spent in each bin.

library(data.table)

#### Create dummy parameter values
P1 <- rnorm(2000,400, 50);
Date <- seq(from=as.Date("2010/1/1"), by = "day", length.out = length(P1));
RECORD_NUMBER <- rep(1:10, 200);
RUN_NUMBER <- rep(1:200, each=10, len = 2000);

#### Combine the dummy parameters into a dataframe
data <- data.frame(Date, RECORD_NUMBER, RUN_NUMBER, P1);

#### Calculating operating hours for each run
setDT(data);
running_hours_table <- data[ , .(opHours = sum(P1)), by = .(RUN_NUMBER)];

#### Set the join keys for the data and running_hours tables
setkey(data, RUN_NUMBER);
setkey(running_hours_table, RUN_NUMBER);

#### Combine tables row-wise
data <- data[running_hours_table];

data$P1.countRate <- (data$P1 / data$opHours)

Is it possible to generate the opHours column in the data table without first creating a separate table and then joining them back together?

minem · Accepted Answer

data2[ , opHours := sum(P1), by = .(RUN_NUMBER)]

You should probably read some materials about data.table: wiki Getting-started or data.table.cheat.sheet

Summing rows grouped by another parameter in R

Answers (1)

Related Questions