Dan Blanchard
Dan Blanchard

Reputation: 23

use fill to conditionally fill NA values without loop

I'm looking for a way to conditionally use the fill() command or something similar to populate NA values based on the previous row value, provided a condition is met. I've looked for a solution, but haven't been able to get anything I've found to work.

The data looks something like below, but there are multiple 'caseval' measurements:

casedate = seq(as.Date('2018/1/1'),as.Date('2018/3/5'),b='week')
caseid = c(rep(1,10),rep(2,10),rep(3,10))
caseval = c(80,rep(NA,4),rep(80,5),40,rep(NA,2),rep(40,4),rep(50,3),rep(NA,7),rep(70,3))
df = cbind.data.frame(casedate,caseid, caseval)

With each record representing a date and a measurement. For some items, the measurement was skipped when it didn't change, for others, no measurement was relevant on the date (indicated by the first record for that caseid being NA), like row 21 below. When using fill without conditions, rows 21:27 are filled with caseid 2's values, which is incorrect.

     casedate caseid caseval
1  2018-01-01      1      80
2  2018-01-08      1      NA
3  2018-01-15      1      NA
4  2018-01-22      1      NA
5  2018-01-29      1      NA
6  2018-02-05      1      80
7  2018-02-12      1      80
8  2018-02-19      1      80
9  2018-02-26      1      80
10 2018-03-05      1      80
11 2018-01-01      2      40
12 2018-01-08      2      NA
13 2018-01-15      2      NA
14 2018-01-22      2      40
15 2018-01-29      2      40
16 2018-02-05      2      40
17 2018-02-12      2      40
18 2018-02-19      2      50
19 2018-02-26      2      50
20 2018-03-05      2      50
**21 2018-01-01      3      NA**
22 2018-01-08      3      NA
23 2018-01-15      3      NA
24 2018-01-22      3      NA
25 2018-01-29      3      NA
26 2018-02-05      3      NA
27 2018-02-12      3      NA
28 2018-02-19      3      70
29 2018-02-26      3      70
30 2018-03-05      3      70

I have tried a loop, which works, but is very slow

for (i in 1:nrow(df)) {
  for (item in list_casevals) {
    if (df[i,'caseid']==df[i-1,'caseid'] && is.na(df[i,item])) {
      df[i,item]=df[i-1,item]
    }
  }
}

and I've tried using fill in an ifelse, but it only replaces the NA in row 2, and no others, unless it's run again, when it replaces row 3, etc.

df = df %>%
  mutate(., caseval = ifelse(lag(caseid)==caseid & is.na(caseval),fill(caseval),caseval))

is there a faster way to do this, other than using the loop?

Upvotes: 2

Views: 124

Answers (1)

Melissa Key
Melissa Key

Reputation: 4551

Can you just group by the caseid? For example,

df <- df %>%
  group_by(caseid) %>%
  fill(caseval) %>%
  ungroup()

Upvotes: 4

Related Questions