user10187475
user10187475

Reputation: 1

Stacked raster layers have different number of observations in R

I have trouble analysing MODIS NDVI data (Terra and Aqua), which was downloaded from AppEEARS.

The data is over a certain extent over a period of time, so I stack the different layers in one file.

library(raster)
setwd("~/NDVI/")
NDVI <- list.files(full.names = TRUE, pattern = ".tif$")
sNDVI <- stack(NDVI)

No issues here, but when I look at the number of observations per layer, they are different between layers. This makes further analyses between layers very difficult.

 layers <- 0; layers <- as.data.frame(layers)
    for (i in 1:NDVI@data@nlayers){
      layers[i,] <- length(rasterToPoints(subset(NDVI, i)))
    }

summary(layers[,1])
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
3330678 3340078 3342094 3341450 3343584 3345408

length(layers[,1])
[1] 142

I used to download the NDVI layers without problem from their previous system. Does anyone know what I am doing wrong, or how I could fix this? Cropping doesn't work.

Thank you for your help!

EDIT

The problem was caused by NAs in the data. I learned here how to visualise the NAs:

NDVI <- reclassify(NDVI, cbind(NA, 1000)) #or some value way higher than the rest of your data

plot(subset(NDVI, 1)

This revealed that the NAs were spread throughout my extent (I don't know why, as that was not the case with the previous data I downloaded). I needed to fill the NAs somehow and went for the less-than-optimal solution of filling them in with the value of the next non-NA value in that column:

library(zoo)
head(NDVI@data@values)
NDVI@data@values <- na.locf(NDVI@data@values, fromLast = T) # 'fromLast = T' makes the function take the value of the *next* rather than the *previous* value to fill in the NA.

Since the NAs made up less than 0.1 % of my total data, this didn't affect the mean value per layer I wanted to calculate.

This is probably not the way you want to deal with rasters, but thought I could be informative to edit my original question nonetheless. Cheers,

Upvotes: 0

Views: 632

Answers (1)

Jeffrey Evans
Jeffrey Evans

Reputation: 2397

This is a poor way to evaluate multivariate raster array data. The raster::rasterToPoints functions is dropping NA's thus, the differences in length. This is not a difference in n but, rather missing data.

The arrays match, otherwise you would receive an error with raster::stack. I would recommend applying common smoothing and nodata imputation procedures to this data. This will deal with the missing data, which is expected with this type of data due to effects such as clouds.

The development version of the spatialEco package has functions for raster timeseries data: smooth.time.series (NA imputation), sg.smooth (Savitzky-Golay smoothing) and raster.kendall (Kendall's Tau with Theil-Sen monotonic trend slope).

Upvotes: 1

Related Questions