eduardokapp
eduardokapp

Reputation: 1751

Defining multiple rows based on data subset in a vectorized manner

Suppose I have the following dataframe:

data <- data.frame(
    row_idx = 1:120,
    value = runif(120, min = 0, max = 1)
)
data$index <- rep(seq_len(nrow(data) / 4), each = 4)

data$value[data$row_idx %% 4 != 0] <- NA

Which should give me something along:

  row_idx     value index
1       1        NA     1
2       2        NA     1
3       3        NA     1
4       4 0.1463743     1
5       5        NA     2
6       6        NA     2
7       7        NA     2
8       8 0.2197675     2
...

I know this may look silly, I'm just trying to reproduce the problem.

The issue in question is: How can I, for each index group (1, 1, 1, 1; 2, 2, 2, 2; ...), make the value column equal to its single non-NA value divided by 4 and then replicated across the NA rows?

The expected output would be, for example:

# desired output
  row_idx value index
1       1 0.03659358     1
2       2 0.03659358     1
3       3 0.03659358     1
4       4 0.03659358     1
5       5 0.05494188     2
6       6 0.05494188     2
7       7 0.05494188     2
8       8 0.05494188     2
...

Notice how the first four values are simply 0.1463743 / 4.

I know this can be solved by using loops, apply functions and such, but is there a vectorized way of doing this? A one, two liner tops?

Upvotes: 1

Views: 56

Answers (2)

PKumar
PKumar

Reputation: 11128

You can try ave from base R:

data$newcol <- ave(data$value, data$index ,  FUN=function(x)sum(x, na.rm=TRUE)/NROW(x))

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26218

do this

library(tidyverse)

data$value <- data$value /4
data %>% group_by(index) %>%
  fill(value, .direction = "up")

Check

set.seed(123)
data <- data.frame(row_idx = 1:120,
  value = runif(120, min = 0, max = 1))
data$index <- rep(seq_len(nrow(data) / 4), each = 4)

data$value[data$row_idx %% 4 != 0] <- NA

data$value <- data$value /4
data %>% group_by(index) %>%
  fill(value, .direction = "up")

# A tibble: 120 x 3
# Groups:   index [30]
   row_idx value index
     <int> <dbl> <int>
 1       1 0.221     1
 2       2 0.221     1
 3       3 0.221     1
 4       4 0.221     1
 5       5 0.223     2
 6       6 0.223     2
 7       7 0.223     2
 8       8 0.223     2
 9       9 0.113     3
10      10 0.113     3
# ... with 110 more rows

If you don't want to use pipes, use simply this

data <- fill(data, value, .direction = "up")
  • Another option
data$value <- rev(zoo::na.locf(rev(data$value/4)))

    row_idx      value index
1         1 0.22075435     1
2         2 0.22075435     1
3         3 0.22075435     1
4         4 0.22075435     1
5         5 0.22310476     2
6         6 0.22310476     2
7         7 0.22310476     2
8         8 0.22310476     2
9         9 0.11333354     3
10       10 0.11333354     3
11       11 0.11333354     3
12       12 0.11333354     3
13       13 0.22495624     4
14       14 0.22495624     4
15       15 0.22495624     4
16       16 0.22495624     4
17       17 0.23862591     5
18       18 0.23862591     5
19       19 0.23862591     5
20       20 0.23862591     5
21       21 0.24856744     6
22       22 0.24856744     6
23       23 0.24856744     6
24       24 0.24856744     6
25       25 0.14853551     7
26       26 0.14853551     7
27       27 0.14853551     7
28       28 0.14853551     7
29       29 0.22557476     8
30       30 0.22557476     8
31       31 0.22557476     8
32       32 0.22557476     8
33       33 0.11944899     9
34       34 0.11944899     9
35       35 0.11944899     9
36       36 0.11944899     9
37       37 0.05790645    10
38       38 0.05790645    10
39       39 0.05790645    10
40       40 0.05790645    10
41       41 0.09221136    11
42       42 0.09221136    11
43       43 0.09221136    11
44       44 0.09221136    11
45       45 0.11649061    12
46       46 0.11649061    12
47       47 0.11649061    12
48       48 0.11649061    12
49       49 0.11055002    13
50       50 0.11055002    13
51       51 0.11055002    13
52       52 0.11055002    13
53       53 0.05163285    14
54       54 0.05163285    14
55       55 0.05163285    14
56       56 0.05163285    14
57       57 0.09361569    15
58       58 0.09361569    15
59       59 0.09361569    15
60       60 0.09361569    15
61       61 0.06859591    16
62       62 0.06859591    16
63       63 0.06859591    16
64       64 0.06859591    16
65       65 0.20309738    17
66       66 0.20309738    17
67       67 0.20309738    17
68       68 0.20309738    17
69       69 0.15730528    18
70       70 0.15730528    18
71       71 0.15730528    18
72       72 0.15730528    18
73       73 0.05502972    19
74       74 0.05502972    19
75       75 0.05502972    19
76       76 0.05502972    19
77       77 0.02778386    20
78       78 0.02778386    20
79       79 0.02778386    20
80       80 0.02778386    20
81       81 0.19704896    21
82       82 0.19704896    21
83       83 0.19704896    21
84       84 0.19704896    21
85       85 0.22326278    22
86       86 0.22326278    22
87       87 0.22326278    22
88       88 0.22326278    22
89       89 0.16327548    23
90       90 0.16327548    23
91       91 0.16327548    23
92       92 0.16327548    23
93       93 0.04692278    24
94       94 0.04692278    24
95       95 0.04692278    24
96       96 0.04692278    24
97       97 0.12787636    25
98       98 0.12787636    25
99       99 0.12787636    25
100     100 0.12787636    25
101     101 0.23861846    26
102     102 0.23861846    26
103     103 0.23861846    26
104     104 0.23861846    26
105     105 0.15218375    27
106     106 0.15218375    27
107     107 0.15218375    27
108     108 0.15218375    27
109     109 0.07530722    28
110     110 0.07530722    28
111     111 0.07530722    28
112     112 0.07530722    28
113     113 0.03557357    29
114     114 0.03557357    29
115     115 0.03557357    29
116     116 0.03557357    29
117     117 0.10112757    30
118     118 0.10112757    30
119     119 0.10112757    30
120     120 0.10112757    30

Upvotes: 1

Related Questions