Matt Bannert
Matt Bannert

Reputation: 28264

Handling different vector lengths caused by na.omit in sapply?

I have a data.frame with several columns some of which contain NAs. I want to run the following function suggested by Farnsworth over every single column:

hpfilter = function(x,lambda=1600){
   eye <- diag(length(x))
   result <- solve(eye+lambda*crossprod(diff(eye,lag=1,d=2)),x)
   return(result)
 }

I do so by:

test <- as.data.frame(sapply(vectorOfColumnNames,function(X) hpfilter(mydf[,X])))

which works fine as long as none of the columns contain NAs. If I add an na.omit to the function it continues to work well with the same amount of NAs.

But how can I handle every column truly on its own and end up with a data.frame at the end (that contains NAs where the input had NAs) ?

EDIT: I wonder whether there is a general solution to the problem of ending up with vectors of different length when running a function over apply. Maybe something similar to what is possible with data.table indexing.

Upvotes: 0

Views: 384

Answers (1)

Paul Hiemstra
Paul Hiemstra

Reputation: 60924

It is not completely clear to me what you want, but I'll give it a try.

Let's create some example data. Note that I use a matrix and not a data.frame. Explicitely iterating over the columnnames is now not needed, greatly simplifying the code.

m = matrix(runif(100), 10, 10)
apply(m, 2, hpfilter)

And introduce some NA values:

m[sample(1:10, 2), sample(1:10, 2)] <- NA
apply(m, 2, hpfilter)

A tweak to the hpfilter function yields the result, I believe, you are looking for:

hpfilter = function(x,lambda=1600, na.omit = TRUE) {
   if(na.omit) {
      na_values = is.na(x)
      if(any(na_values)) x = x[-which(na_values)]
   }
   eye <- diag(length(x))
   result <- solve(eye+lambda*crossprod(diff(eye,lag=1,d=2)),x)
   for(idx in which(na_values)) result = append(result, NA, idx - 1) # reinsert NA values
   return(result)
 }

Essentially, NA's are torn out of the dataset. The high pass filter is then based on the values surrounding the NA, e.g. the next or previous hour. Later the NA's are reintroduced. You need to think carefully if this is the way you want to deal with NA's. If there are a large number of consecutive NA's, you start apply your high pass filter to pieces of the timeseries which are far apart.

The output:

> m
           [,1]       [,2]       [,3]        [,4]      [,5]      [,6]
 [1,] 0.3492249 0.13243768         NA 0.302102537 0.4229100 0.5922950
 [2,] 0.2933371 0.20001802 0.03145775 0.429109073 0.9597172 0.9490127
 [3,] 0.7040072 0.49672438 0.22093906 0.323518480 0.4842678 0.4081306
 [4,] 0.9072993 0.86930200 0.52859786 0.122859661 0.1841663 0.5389729
 [5,] 0.3236061 0.38602856 0.46249498 0.866068888 0.6981199 0.9766099
 [6,] 0.4878379 0.31511419         NA 0.807535084 0.6563737 0.0419552
 [7,] 0.3244131 0.34287848 0.31360175 0.821228400 0.5989790 0.6631735
 [8,] 0.3758025 0.39728965 0.64960319 0.283663049 0.9054992 0.8160815
 [9,] 0.4485784 0.06440579 0.67518605 0.815575767 0.1479089 0.6391120
[10,] 0.9061172 0.16812244 0.86293095 0.005075972 0.6736308 0.7574890
             [,7]       [,8]      [,9]       [,10]
 [1,]          NA 0.02125704 0.7029417 0.490146887
 [2,] 0.353827474 0.40482437 0.2102700 0.351850122
 [3,] 0.778491744 0.32676623 0.6709055 0.953126856
 [4,] 0.825446342 0.24411303 0.4939415 0.026877439
 [5,] 0.264156057 0.30620799 0.0474103 0.505411467
 [6,]          NA 0.63995093 0.6155766 0.736349958
 [7,] 0.048948805 0.96751061 0.9697167 0.005304793
 [8,] 0.733419331 0.85554984 0.7438209 0.581133546
 [9,] 0.823691194 0.74550281 0.0635690 0.903188495
[10,] 0.009001798 0.74201923 0.3516963 0.904093070
> apply(m, 2, hpfilter)
           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
 [1,] 0.4337716 0.4101083        NA 0.4239194 0.5762643 0.6178718        NA
 [2,] 0.4512989 0.3950404 0.1219334 0.4367185 0.5756097 0.6219962 0.5909609
 [3,] 0.4687735 0.3797990 0.2209373 0.4494414 0.5748593 0.6261047 0.5593590
 [4,] 0.4860436 0.3640885 0.3198847 0.4620073 0.5741572 0.6303856 0.5276089
 [5,] 0.5031048 0.3476868 0.4187190 0.4742566 0.5735911 0.6348910 0.4956993
 [6,] 0.5202157 0.3306871        NA 0.4858177 0.5730049 0.6396161        NA
 [7,] 0.5375230 0.3132068 0.5175141 0.4965640 0.5723201 0.6447694 0.4638051
 [8,] 0.5551529 0.2953536 0.6163712 0.5065697 0.5715107 0.6501860 0.4319566
 [9,] 0.5730986 0.2772537 0.7152643 0.5161124 0.5705671 0.6557125 0.3999246
[10,] 0.5912411 0.2590969 0.8141878 0.5253298 0.5696884 0.6612990 0.3676684
           [,8]      [,9]     [,10]
 [1,] 0.1423571 0.5362741 0.3871990
 [2,] 0.2276829 0.5253623 0.4217619
 [3,] 0.3129329 0.5145546 0.4563892
 [4,] 0.3981423 0.5037583 0.4911015
 [5,] 0.4833547 0.4929783 0.5262298
 [6,] 0.5685175 0.4822135 0.5618152
 [7,] 0.6534674 0.4711843 0.5978857
 [8,] 0.7380857 0.4596942 0.6345782
 [9,] 0.8224501 0.4478587 0.6716594
[10,] 0.9067115 0.4359704 0.7088627

Upvotes: 4

Related Questions