Defining multiple rows based on data subset in a vectorized manner

Question

Suppose I have the following dataframe:

data <- data.frame(
    row_idx = 1:120,
    value = runif(120, min = 0, max = 1)
)
data$index <- rep(seq_len(nrow(data) / 4), each = 4)

data$value[data$row_idx %% 4 != 0] <- NA

Which should give me something along:

  row_idx     value index
1       1        NA     1
2       2        NA     1
3       3        NA     1
4       4 0.1463743     1
5       5        NA     2
6       6        NA     2
7       7        NA     2
8       8 0.2197675     2
...

I know this may look silly, I'm just trying to reproduce the problem.

The issue in question is: How can I, for each index group (1, 1, 1, 1; 2, 2, 2, 2; ...), make the value column equal to its single non-NA value divided by 4 and then replicated across the NA rows?

The expected output would be, for example:

# desired output
  row_idx value index
1       1 0.03659358     1
2       2 0.03659358     1
3       3 0.03659358     1
4       4 0.03659358     1
5       5 0.05494188     2
6       6 0.05494188     2
7       7 0.05494188     2
8       8 0.05494188     2
...

Notice how the first four values are simply 0.1463743 / 4.

I know this can be solved by using loops, apply functions and such, but is there a vectorized way of doing this? A one, two liner tops?

PKumar · Accepted Answer

You can try ave from base R:

data$newcol <- ave(data$value, data$index ,  FUN=function(x)sum(x, na.rm=TRUE)/NROW(x))

Defining multiple rows based on data subset in a vectorized manner

Answers (2)

Related Questions