Reputation: 28264
I have a data.frame with several columns some of which contain NAs. I want to run the following function suggested by Farnsworth over every single column:
hpfilter = function(x,lambda=1600){
eye <- diag(length(x))
result <- solve(eye+lambda*crossprod(diff(eye,lag=1,d=2)),x)
return(result)
}
I do so by:
test <- as.data.frame(sapply(vectorOfColumnNames,function(X) hpfilter(mydf[,X])))
which works fine as long as none of the columns contain NAs. If I add an na.omit
to the function it continues to work well with the same amount of NAs.
But how can I handle every column truly on its own and end up with a data.frame at the end (that contains NAs where the input had NAs) ?
EDIT: I wonder whether there is a general solution to the problem of ending up with vectors of different length when running a function over apply. Maybe something similar to what is possible with data.table
indexing.
Upvotes: 0
Views: 384
Reputation: 60924
It is not completely clear to me what you want, but I'll give it a try.
Let's create some example data. Note that I use a matrix
and not a data.frame
. Explicitely iterating over the columnnames is now not needed, greatly simplifying the code.
m = matrix(runif(100), 10, 10)
apply(m, 2, hpfilter)
And introduce some NA
values:
m[sample(1:10, 2), sample(1:10, 2)] <- NA
apply(m, 2, hpfilter)
A tweak to the hpfilter
function yields the result, I believe, you are looking for:
hpfilter = function(x,lambda=1600, na.omit = TRUE) {
if(na.omit) {
na_values = is.na(x)
if(any(na_values)) x = x[-which(na_values)]
}
eye <- diag(length(x))
result <- solve(eye+lambda*crossprod(diff(eye,lag=1,d=2)),x)
for(idx in which(na_values)) result = append(result, NA, idx - 1) # reinsert NA values
return(result)
}
Essentially, NA
's are torn out of the dataset. The high pass filter is then based on the values surrounding the NA
, e.g. the next or previous hour. Later the NA
's are reintroduced. You need to think carefully if this is the way you want to deal with NA
's. If there are a large number of consecutive NA
's, you start apply your high pass filter to pieces of the timeseries which are far apart.
The output:
> m
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.3492249 0.13243768 NA 0.302102537 0.4229100 0.5922950
[2,] 0.2933371 0.20001802 0.03145775 0.429109073 0.9597172 0.9490127
[3,] 0.7040072 0.49672438 0.22093906 0.323518480 0.4842678 0.4081306
[4,] 0.9072993 0.86930200 0.52859786 0.122859661 0.1841663 0.5389729
[5,] 0.3236061 0.38602856 0.46249498 0.866068888 0.6981199 0.9766099
[6,] 0.4878379 0.31511419 NA 0.807535084 0.6563737 0.0419552
[7,] 0.3244131 0.34287848 0.31360175 0.821228400 0.5989790 0.6631735
[8,] 0.3758025 0.39728965 0.64960319 0.283663049 0.9054992 0.8160815
[9,] 0.4485784 0.06440579 0.67518605 0.815575767 0.1479089 0.6391120
[10,] 0.9061172 0.16812244 0.86293095 0.005075972 0.6736308 0.7574890
[,7] [,8] [,9] [,10]
[1,] NA 0.02125704 0.7029417 0.490146887
[2,] 0.353827474 0.40482437 0.2102700 0.351850122
[3,] 0.778491744 0.32676623 0.6709055 0.953126856
[4,] 0.825446342 0.24411303 0.4939415 0.026877439
[5,] 0.264156057 0.30620799 0.0474103 0.505411467
[6,] NA 0.63995093 0.6155766 0.736349958
[7,] 0.048948805 0.96751061 0.9697167 0.005304793
[8,] 0.733419331 0.85554984 0.7438209 0.581133546
[9,] 0.823691194 0.74550281 0.0635690 0.903188495
[10,] 0.009001798 0.74201923 0.3516963 0.904093070
> apply(m, 2, hpfilter)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.4337716 0.4101083 NA 0.4239194 0.5762643 0.6178718 NA
[2,] 0.4512989 0.3950404 0.1219334 0.4367185 0.5756097 0.6219962 0.5909609
[3,] 0.4687735 0.3797990 0.2209373 0.4494414 0.5748593 0.6261047 0.5593590
[4,] 0.4860436 0.3640885 0.3198847 0.4620073 0.5741572 0.6303856 0.5276089
[5,] 0.5031048 0.3476868 0.4187190 0.4742566 0.5735911 0.6348910 0.4956993
[6,] 0.5202157 0.3306871 NA 0.4858177 0.5730049 0.6396161 NA
[7,] 0.5375230 0.3132068 0.5175141 0.4965640 0.5723201 0.6447694 0.4638051
[8,] 0.5551529 0.2953536 0.6163712 0.5065697 0.5715107 0.6501860 0.4319566
[9,] 0.5730986 0.2772537 0.7152643 0.5161124 0.5705671 0.6557125 0.3999246
[10,] 0.5912411 0.2590969 0.8141878 0.5253298 0.5696884 0.6612990 0.3676684
[,8] [,9] [,10]
[1,] 0.1423571 0.5362741 0.3871990
[2,] 0.2276829 0.5253623 0.4217619
[3,] 0.3129329 0.5145546 0.4563892
[4,] 0.3981423 0.5037583 0.4911015
[5,] 0.4833547 0.4929783 0.5262298
[6,] 0.5685175 0.4822135 0.5618152
[7,] 0.6534674 0.4711843 0.5978857
[8,] 0.7380857 0.4596942 0.6345782
[9,] 0.8224501 0.4478587 0.6716594
[10,] 0.9067115 0.4359704 0.7088627
Upvotes: 4