Reputation: 33

lidR lascatalog processing engine transformation outputs differ between sequential and parallel processing

I'm using the lidR package (version 4.1.2) to process ~2000 *.laz point cloud files from airborne laser scanning. One step includes transforming the coordinate system from Gauß-Krüger (EPSG 31468) to UTM 32 (EPSG 25832). To do this I want to use the de_adv_BETA2007 transformation grid instead of the default option to get best possible results. So I use st:transform() of the sf package (version 1.0-19) with the corresponding specific proj pipeline argument. It does work fine and I can see a little (~0.5 m) shift, which confirms, that the transformation is working better. Also, comparing the results with ALS data from the bavarian surveying agency confirms a better geolocation.

So, naturally, after testing with a small subset of the data I want to transform all datasets. To save time, I wanted to use the lascatalog processing engine implemented into lidR together with the future package to parallelize the processing and save time. It turns out, that whenever I apply the parallelization by running plan(multisession, workers = 6), st_transform is not using the BETA2007 method but the default one, introducing the 0.5m shift. Whenever plan(multisession, workers = 6) is not run, the processing is done sequentially and the results are correct.

Here is the full code I've used:

library(lidR)
library(sf)
library(mapview)
library(future)

#' check set up of sf package:
sf_proj_search_paths()
sf_extSoftVersion()
sf_proj_network(enable = T)

#' get transformation options available with grid:
options <- sf_proj_pipelines(source_crs = "EPSG:31468", target_crs = "EPSG:25832")

#' select correct pipeline:
pipeline_BETA2007 <- options[1,]$definition

#' set up parallel processing with 6 cores:
plan(multisession, workers = 6)

#' read Lascatalog:
ctg <- readALSLAScatalog("../Data/GK")

#' apply epsg code, as it is not specified in the laz headers:
st_crs(ctg) <- 31468

#' check LAScatalog vailidity:
ctg
las_check(ctg)

#' plot LAScatalog:
plot(ctg, mapview = TRUE)

#' define output location and structure of catalog:
opt_output_files(ctg) <- ".../Data/UTM_parallel/{ORIGINALFILENAME}_UTM"
opt_laz_compression(ctg) <- TRUE
opt_chunk_buffer(ctg) <- 0
opt_chunk_size(ctg) <- 0

#' function to reproject las data:
reproject_catalog = function(las)
{
  las_trans = sf::st_transform(las, crs = 25832, pipeline = pipeline_BETA2007)
  return(las_trans)
}

#' apply function to catalog:
reprojected_ctg = catalog_map(ctg, reproject_catalog)

Of course the only code that was altered when running it without parallelization was to simply comment out the plan(multisession, workers = 6) part and adjusting the output file location. The data (selection of 4 tiles) including inputs and outputs can be downloaded here: https://workupload.com/file/N9dHMZ9TcC7

Here is a little more information on my processing environment:

> R.version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          4.1                              
year           2024                             
month          06                               
day            14                               
svn rev        86737                            
language       R                                
version.string R version 4.4.1 (2024-06-14 ucrt)
nickname       Race for Your Life

> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.12.2"        "3.9.3"        "9.4.1"         "true"         "true"        "9.4.1"

I've tried different things but none were successful. At no time a warning or error was generated. I will just use bpapply in combination with parallel as an workaround to do the processing but it should be possible to do it via lidR. If I'm overlooking something obvious, I'm sorry.

Upvotes: 1

Answers (2)

JayDXQ

Reputation: 33

So, after trying out a few things I actually ended up with a solution to my specific problem. As pointed out by @JRR, the pipeline argument does not work with LidR LAS objects as specify it directly in the function but it just does not work. However, you can still use different transformation pipelines, they just need to be the default option with st_transform. So when calling sf::st_transform(las, crs = 25832), it uses the method that is at the first place when looking at the options through options <- sf_proj_pipelines(source_crs = "EPSG:31468", target_crs = "EPSG:25832"). In my case it is a transformation that makes use of an NTv2 file, which can only be accessed through the connection to the network with sf_proj_network(enable = T), however I tested a few times and the only way I could use it with the Lascatalog processing engine was to actually put the file itself in the directory of st_proj_search_paths().

Upvotes: 0

JRR

Reputation: 3253

sf does not have st_transform for LAS objects from the lidR package. The lidRversion inherits its own functionst_transformand because of some inheritance rules it must be call thoughtsf::st_transform. The lidR version of st_transformis simpler than thesf` one. It does not have a pipeline argument, so your code does not work either in sequential or parallel.

For your code to work, I should update lidR. You cannot simply convert to POINT, It is a little more comlex than that. Below is a modified version of the lidR code where I'm using the ellipsis ... to propagate sf options. With this version, your pipeline argument should be passed to the actual sf::st_transform version.

As you can see lidR uses MULTIPOINT for efficiency and there is a lot of code to maintain header and quantization valid with respect to LAS specifications

st_transform.LAS <- function(x, crs, ...)
{
  if (is.na(st_crs(x)))
    stop("No transformation possible from NA reference system")
  
  p <- list(...)
  
  # Transform the point coordinates
  sfpts <- sf::st_multipoint(as.matrix(sf::st_coordinates(x)))
  sfpts <- sf::st_geometry(sfpts)
  sfpts <- sf::st_set_crs(sfpts, st_crs(x))
  sfpts <- sf::st_transform(sfpts, crs, ...)
  geom  <- sf::st_geometry(sfpts)
  X <- geom[[1]][,1]
  Y <- geom[[1]][,2]
  Z <- geom[[1]][,3]
  
  # Update the offsets in the header
  offsetx <- floor(min(X))
  offsety <- floor(min(Y))
  offsetz <- floor(min(Z))
  if (!is.null(p$xoffset)) offsetx <- p$xoffset
  if (!is.null(p$yoffset)) offsety <- p$yoffset
  if (!is.null(p$zoffset)) offsetz <- p$zoffset
  x@header@PHB[["X offset"]] <- offsetx
  x@header@PHB[["Y offset"]] <- offsety
  x@header@PHB[["Z offset"]] <- offsetz
  
  # Update the scale factors
  scalex <- x[["X scale factor"]]
  scaley <- x[["Y scale factor"]]
  scalez <- x[["Z scale factor"]]
  if (!is.null(p$scale)) scalex <- scaley <- scalez <- p$scale
  x@header@PHB[["X scale factor"]] <- scalex
  x@header@PHB[["Y scale factor"]] <- scaley
  x@header@PHB[["Z scale factor"]] <- scalez
  
  # Quantize the coordinates
  quantize(X, scalex, offsetx)
  quantize(Y, scaley, offsety)
  quantize(Z, scalez, offsetz)
  
  # Update the LAS object
  x@data[["X"]] <- X
  x@data[["Y"]] <- Y
  x@data[["Z"]] <- Z
  
  x <- las_update(x)
  st_crs(x) <- crs
  
  return(x)
}

Upvotes: 0

lidR lascatalog processing engine transformation outputs differ between sequential and parallel processing

Answers (2)

Related Questions