Reputation: 13
I need to calculate the average gap size of a univariate time-series data set. imputeTS package generates plots using this data. Is it possible to extract the 'gap size' and the 'number of occurrence' from either statsNA
or ggplot_na_gapsize
?
Or is there any other way to find the average size of gaps in a time-series data set?
(You could use tsNH4
data set from the imputeTS package)
(This is my first time asking questions here and I'm fairly new to 'r')
Upvotes: 1
Views: 205
Reputation: 7730
At the moment you can get the average gap size only indirectly with some extra work with the CRAN version of imputeTS.
But I made a quick update to the development version on GitHub.
Now you can also get the average gap size with the statsNA
function.
Therefore you have to install the new version from GitHub first (since it is not on CRAN yet):
library("devtools")
install_github("SteffenMoritz/imputeTS")
If you do not have "devtools" installed, then also install this library at the very beginning
install.packages("devtools")
Afterwards just use the imputeTS package as usual.
library("imputeTS")
#Example with the tsNH4 dataset
statsNA(tsNH4)
This will now print you the following:
> statsNA(tsNH4) [1] "Length of time series:" [1] 4552 [1] "-------------------------" [1] "Number of Missing Values:" [1] 883 [1] "-------------------------" [1] "Percentage of Missing Values:" [1] "19.4%" [1] "-------------------------" [1] "Number of Gaps:" [1] 155 [1] "-------------------------" [1] "Average Gap Size:" [1] 5.696774 [1] "-------------------------" [1] "Stats for Bins" [1] " Bin 1 (1138 values from 1 to 1138) : 233 NAs (20.5%)" [1] " Bin 2 (1138 values from 1139 to 2276) : 433 NAs (38%)" [1] " Bin 3 (1138 values from 2277 to 3414) : 135 NAs (11.9%)" [1] " Bin 4 (1138 values from 3415 to 4552) : 82 NAs (7.21%)" [1] "-------------------------" [1] "Longest NA gap (series of consecutive NAs)" [1] "157 in a row" [1] "-------------------------" [1] "Most frequent gap size (series of consecutive NA series)" [1] "1 NA in a row (occuring 68 times)" [1] "-------------------------" [1] "Gap size accounting for most NAs" [1] "157 NA in a row (occuring 1 times, making up for overall 157 NAs)"
As you can see, 'Number of gaps' and 'Average gap size' is now newly added to the output.
You can also access the output as a variable:
library("imputeTS")
#To actually get a output object, set print_only to false
out <- statsNA(tsNH4, print_only = F)
# Average gap size
out$average_size_na_gaps
# Number of Gaps
out$number_na_gaps
#Number of NAs
out$number_NAs
The updates will also be in the next CRAN update. (thanks for the suggestion) Just be a little bit careful, since it is a development version - thus not so thoroughly tested as the CRAN version.
Upvotes: 0