Reputation: 63

Select values from a 4-dimensional R stars array using 2-dimensional array containing the dimension values to select

Maybe somebody can help me with this:

I have a 4-dimensional spatiotemporal stars dataset, which contains annual values of a variable for four different tree species:

library(stars)
# 4-dimensional stars data: x,y,time, species
dat <- st_as_stars(array(data = runif(2000), dim = c(10, 10, 5, 4))) |> 
  setNames("data_val") |>
  st_set_dimensions(names= c("x","y", "year", "tree_species")) |>
  st_set_dimensions("year", values = 2018:2022) |>
  st_set_dimensions("tree_species",values = c("bu", "fi", "ki", "ei"))
plot(dat)  # (only values of the first slice of tree_speciess ("bu") displayed)

As a second object I have a species identity map, which contains information on which pixel belongs to which species:

# 2-d species map (x,y)
species_map <- st_as_stars(matrix(sample(c("bu", "fi", "ki", "ei"),size = 100, replace = TRUE), 
                                  ncol = 10, byrow=T)) |> 
  setNames("tree_species") |> 
  st_set_dimensions(c(1,2), names= c("x","y"))
species_map$tree_species <- factor(species_map$tree_species)
plot(species_map, col = c("#fdc086","#ffff99","#7fc97f","#beaed4"))

Based on this species_map, I want to extract pixel by pixel (x/y) the timeseries from the data that matches tree_species. The resulting stars-object would be a 3-d array (as the 4th data dimension (tree_species) was collapsed), with the data-values in (x,y,year) coming from the respective species entry. So, in the upper-left corner of the resulting stars-array (species ei,ki,ei,bu) there would be the data-values from the 4th-dimension indexes (4,3,4,1).

All my attempts so far failed... I tried to use ifelse(), which resulted in the right dimensionality, however it did not obtain the right results. I also tried to match values, but this also failed:

dat$data_val[, , , species_map$tree_species]

I guess my problem can be solved easily with some base R commands.

Thank you for your help.

All the best Paul

Upvotes: 0

Answers (3)

paulsw

Reputation: 63

Based on the previous posts, I managed to solve this myself. Thank you for your help!

First we need to translate tree_species to the 4th dimension index in dat (like in Allan's for loop, see above), and then use JMenezes' indexmatrix, that just needed some little adjustments:

# create tree_species index grid
# (-> where to find the right tree_species in dat)
species_map$tree_species_ind <- matrix(match(species_map$tree_species,
                                             st_get_dimension_values(dat, 4)), 
                                       nrow= 10, ncol = 10)

# create index matrix 
indexmatrix <- expand.grid(x=1:10, y = 1:10, year = 1:5) # Quickway to generate all time-x-y combinations, recycling by time first
indexmatrix$tree_species <- rep(species_map$tree_species_ind,5) # Add species
indexmatrix <- as.matrix(indexmatrix)

# prepare stars object to hold the results
res <- dat[,,,,1,drop =TRUE] # copy one slice of tree_species from dat-array, and drop dimension

# extract results
res$data_val <- dat$data_val[indexmatrix]
plot(res)

Verification

It's diffcult to check wether everything works as expected with my initial example of random data. Therefore I create a second example of 'stars'-data with a fixed set of numbers varying along dimension 4. In the result, one pixel should finally hold the same value for each year. Also, I change from a square to rectangle map, to check if the rows and columns dimensions are alright.

# data with 4 fixed numbers varying along tree_species-dimension:
sample_numbers <- sample(1:10, size = 4)
dat <- st_as_stars(array(data = rep(sample_numbers,each = 5*7*3), 
                          dim = c(5, 7, 3, 4))) |> 
  setNames("data_val") |>
  st_set_dimensions(names= c("x","y", "year", "tree_species")) |>
  st_set_dimensions("year", values = 2018:2020) |>
  st_set_dimensions("tree_species",values = c("bu", "fi", "ki", "ei"))


# 2-d species map (x,y)
species_map <- st_as_stars(array(sample(c("bu", "fi", "ki", "ei"),
                                         size = 5*7, 
                                         replace = TRUE), 
                                  dim = c(5,7))) |> 
  setNames("tree_species") |> 
  st_set_dimensions(c(1,2), names= c("x","y"))
species_map$tree_species_f <- factor(species_map$tree_species)
plot(species_map["tree_species_f"], col = c("#fdc086","#ffff99","#7fc97f","#beaed4"))

# create index grid
species_map$tree_species_ind <- matrix(match(species_map$tree_species,
                                             st_get_dimension_values(dat, 4)), 
                                       nrow= 5, ncol = 7)

# create index matrix
indexmatrix = expand.grid(x=1:5, y = 1:7, year = 1:3) # Quickway to generate all time-x-y combinations, recycling by time first
indexmatrix$tree_species = rep(species_map$tree_species_ind,3) # Add species
indexmatrix = as.matrix(indexmatrix)

res <- dat[,,,,1,drop =T] # copy one tree_species-slice from dat-array, and drop dimension
res$data_val <- dat$data_val[indexmatrix]
res$data_val_f <- as.factor(res$data_val) # convert to factor for plotting

# Verification
plot(res["data_val_f"]) # results plot: same pattern for each year, as expected
rbind(sample_numbers, st_get_dimension_values(dat, 4)) # Check which number maps to  which tree_species
plot(species_map["tree_species_f"], col = c("#fdc086","#ffff99","#7fc97f","#beaed4")) # plot species map to compare

Upvotes: 0

JMenezes

Reputation: 1059

You can provide a matrix as an index to a multidimensional array. Let me demonstrate with a simpler problem:

test = array(1:27,c(3,3,3))
print(test)

I'll take numbers "22" and "6" out of this array using a matrix. Their indexes are: (1,2,3) and (3,2,1).

indexmatrix = rbind(c(1,2,3),(3,2,1))
test[indexmatrix]

NOTE: there are no commas in the [] brackets. Adding commas will silently convert the matrix to a vector. Also it must be a matrix. Data.frames will not work.

Applying to your problem:

indexmatrix = expand.grid(1:5,1:10,1:10) # Quickway to generate all time-x-y combinations, recycling by time first
indexmatrix$species = species_map$tree_species # Add species
indexmatrix = indexmatrix[,c(2,3,1,4)] # Reorder to match dimensions
indexmatrix = as.matrix(indexmatrix)

dat$data_val[indexmatrix]

Upvotes: 1

Allan Cameron

Reputation: 173793

Perhaps there's a simpler way, but you could achieve this via a loop:

res <- array(0, dim = c(10, 10, 5))

for(i in seq(dim(dat)[1])) {
  for(j in seq(dim(dat)[2])) {
    spe <- species_map$tree_species[i,j]
    spe <- match(spe, attributes(dat)$dimensions$tree_species$values)
    res[i,j,] <- dat$data_val[i,j,,spe]
  }
}

st_as_stars(res) |> 
  setNames("data_val") |>
  st_set_dimensions(names= c("x","y", "year")) |>
  st_set_dimensions("year", values = 2018:2022)
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>                   Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
#> data_val  0.0006052661 0.2296937 0.4519045 0.4826458 0.7461618 0.9963633
#> dimension(s):
#>      from to offset delta point x/y
#> x       1 10      0     1 FALSE [x]
#> y       1 10      0     1 FALSE [y]
#> year    1  5   2018     1 FALSE

Upvotes: 1

Select values from a 4-dimensional R stars array using 2-dimensional array containing the dimension values to select

Answers (3)

Verification

Related Questions