K. Jean
K. Jean

Reputation: 3

create month column by summing over another column in data.frame

In R, I am trying to create a month column to plot my data with by summing over another column that has the same value for each population I am working with, ex:

NAME ORIG_ROW MONTH
POP1 1        1
POP1 1        2
POP1 1        3
POP2 2        1
POP2 2        2
POP2 2        3

I am able to do this with:

df$MONTH <- sapply(1:nrow(df), function(i) (colSums(df[0:i, c('ORIG_ROW') == df$ORIG_ROW[i]))

However, this code is inefficient when I try to apply it to a large dataset (~825k observations).

Does anyone have suggestions on how to make this code more efficient?

Upvotes: 0

Views: 34

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

What you want can be done with a simple call to ave, grouping a column by itself.

df$MONTH <- with(df, ave(ORIG_ROW, ORIG_ROW, FUN = seq_along))

DATA.

df <-
structure(list(NAME = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("POP1", 
"POP2"), class = "factor"), ORIG_ROW = c(1L, 1L, 1L, 2L, 2L, 
2L)), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 1

Related Questions