Reputation: 3
In R, I am trying to create a month column to plot my data with by summing over another column that has the same value for each population I am working with, ex:
NAME ORIG_ROW MONTH
POP1 1 1
POP1 1 2
POP1 1 3
POP2 2 1
POP2 2 2
POP2 2 3
I am able to do this with:
df$MONTH <- sapply(1:nrow(df), function(i) (colSums(df[0:i, c('ORIG_ROW') == df$ORIG_ROW[i]))
However, this code is inefficient when I try to apply it to a large dataset (~825k observations).
Does anyone have suggestions on how to make this code more efficient?
Upvotes: 0
Views: 34
Reputation: 76402
What you want can be done with a simple call to ave
, grouping a column by itself.
df$MONTH <- with(df, ave(ORIG_ROW, ORIG_ROW, FUN = seq_along))
DATA.
df <-
structure(list(NAME = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("POP1",
"POP2"), class = "factor"), ORIG_ROW = c(1L, 1L, 1L, 2L, 2L,
2L)), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 1