12666727b9
12666727b9

Reputation: 1127

Plotting missing data

I'm trying plotting the following imputed dataset with LOCF method, according this procedure

> dati
# A tibble: 27 x 6
      id sex      d8   d10   d12   d14
   <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
 1     1 F      21    20    21.5  23  
 2     2 F      21    21.5  24    25.5
 3     3 NA     NA    24    NA    26  
 4     4 F      23.5  24.5  25    26.5
 5     5 F      21.5  23    22.5  23.5
 6     6 F      20    21    21    22.5
 7     7 F      21.5  22.5  23    25  
 8     8 F      23    23    23.5  24  
 9     9 F      NA    21    NA    21.5
10    10 F      16.5  19    19    19.5
# ... with 17 more rows

dati_locf <- dati %>% mutate(across(everything(),na.locf)) %>%
  mutate(across(everything(),na.locf,fromlast = T))

apply(dati_locf[which(dati_locf$sex=="F"),1:4], 1, function(x) lines(x, col = "green"))

Howrever, when I run the last line to plot dataset it turns me back both these error and warning messages:

Warning in xy.coords(x, y) : a NA has been produced by coercion
Error in plot.xy(xy.coords(x, y), type = type, ...) : 
  plot.new has not been called yet
Called from: plot.xy(xy.coords(x, y), type = type, ...)

Can you explain why and how I could fix them? I let you attach the page I has been being address to after running it. enter image description here

Upvotes: 0

Views: 537

Answers (2)

Steffen Moritz
Steffen Moritz

Reputation: 7730

If you just want to plot the LOCF imputation for one variable to see how good the fit for the imputations looks for this one variable, you can use the following:

library(imputeTS)
# Example 1: Visualize imputation by LOCF
imp_locf <- na_locf(tsAirgap)
ggplot_na_imputations(tsAirgap, imp_locf)

enter image description here

tsAirgap is an time series example, which comes with the imputeTS package. You would have to replace this with the time series / variable you want to plot. Imputed values are shown in red. As you can see, for this series last observation carried forward would be kind of ok, but there are algorithms tat come with the imputeTS package, that give a better result (e.g. na_kalman or na_seadec). Here is also an example of next observation carried backward, since you also used NOCB.

library(imputeTS)
# Example 2: Visualize imputation by NOCB
imp_locf <- na_locf(tsAirgap, option = "nocb")
ggplot_na_imputations(tsAirgap, imp_locf)

enter image description here

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269526

There are several problems here:

  • apply will convert its first argument to matrix and since the second column is character it gives a character matrix. Clearly one can't plot that with lines.
  • presumably we want to plot columns 3:6, not 1:4
  • na.locf will produce multiple values that are the same wherever there is an NA but what we really want is to connect non-NA points. Use na.approx instead.
  • lines can only be used after plot but there is no plot command. Use matplot instead.

Making these changes we have the following.

library(zoo)
# see Note below for dati in reproducible form
matplot(na.approx(dati[3:6]), type = "l", ylab = "")
legend("topright", names(dati)[3:6], col = 1:4, lty = 1:4)

(continued after plot) screenshot

We could alternately use ggplot2 graphics. First convert to zoo and then use na.approx and autoplot. Omit facet=NULL if you want separate panels.

library(ggplot2)
autoplot(na.approx(zoo(dati[3:6])), facet = NULL)

Note

We provide dati in reproducible form below. Note that the sex column only contains NA and F so in the absence of direction it will assume those are a logical NA and FALSE. Instead we specify that the sex column is character in the read.table line.

Lines <- "
      id sex      d8   d10   d12   d14
 1     1 F      21    20    21.5  23  
 2     2 F      21    21.5  24    25.5
 3     3 NA     NA    24    NA    26  
 4     4 F      23.5  24.5  25    26.5
 5     5 F      21.5  23    22.5  23.5
 6     6 F      20    21    21    22.5
 7     7 F      21.5  22.5  23    25  
 8     8 F      23    23    23.5  24  
 9     9 F      NA    21    NA    21.5
10    10 F      16.5  19    19    19.5"
dati <- read.table(text = Lines, colClasses = list(sex = "character"))

Upvotes: 1

Related Questions