sf & dplyr: Grouped mean by same coordinates doesn't work

Question

I have a data set extracted from S2 satellite rasters (10 x 10 meters) with 12 values (ras.df.ll), but 6 is in a tile (T21JYG) and the second in another (T21JYG). I'd like to calculate the mean in the same (x,y coordinates) between tiles without success. I don't find any way to recognise 1st row in the first tile is the same coordinate of 1st row in the second tile just the end of my data set. In my example:

library(sf)
library(sfheaders)
library(dplyr)

# Raster (10x10 meters) info in data frame 1 - crs +proj=utm +zone=21 +south +datum=WGS84 +units=m +no_defs
x <-c(789385,789395,789405,789415,789425,789435)
y <-c(6626865,6626865,6626865,6626865,6626865,6626865)
tile <- rep("T21JYG",6)
values <-c(321,249,234,238,224,244)
ras.ds1<-data.frame(x,y,tile,values)
ras.ds1.sf <- st_as_sf(ras.ds1, coords = c("x", "y"), crs = 32721, agr = "constant")
ras.ds1.sf.ll <- st_transform(ras.ds1.sf, crs=4326)
ras.ds1.sf.ll
#Simple feature collection with 6 features and 2 fields
#Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
#Geometry type: POINT
#Dimension:     XY
#Bounding box:  xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
#Geodetic CRS:  WGS 84
#    tile values                    geometry
# 1 T21JYG    321 POINT (-53.98638 -30.45564)
# 2 T21JYG    249 POINT (-53.98628 -30.45563)
# 3 T21JYG    234 POINT (-53.98617 -30.45563)
# 4 T21JYG    238 POINT (-53.98607 -30.45563)
# 5 T21JYG    224 POINT (-53.98596 -30.45563)
# 6 T21JYG    244 POINT (-53.98586 -30.45562)

# Raster (10x10 meters) info in data frame - crs +proj=utm +zone=22 +south +datum=WGS84 +units=m +no_defs 
x <-c(213285,213295,213305,213315,213325,213335)
y <-c(6626955,6626955,6626955,6626955,6626955,6626955)
tile <- rep("T22JBM",6)
values <-c(336,355,363,426,341,308)
ras.ds2 <-data.frame(x,y,tile,values)
ras.ds2.sf <- st_as_sf(ras.ds2, coords = c("x", "y"), crs = 32722, agr = "constant")
ras.ds2.sf.ll <- st_transform(ras.ds2.sf, crs=4326)
ras.ds2.sf.ll
# Simple feature collection with 6 features and 2 fields
# Attribute-geometry relationship: 2 constant, 0 aggregate, 0 identity
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: -53.98638 ymin: -30.45564 xmax: -53.98586 ymax: -30.45562
# Geodetic CRS:  WGS 84
#     tile values                    geometry
# 1 T21JYG    321 POINT (-53.98638 -30.45564)
# 2 T21JYG    249 POINT (-53.98628 -30.45563)
# 3 T21JYG    234 POINT (-53.98617 -30.45563)
# 4 T21JYG    238 POINT (-53.98607 -30.45563)
# 5 T21JYG    224 POINT (-53.98596 -30.45563)
# 6 T21JYG    244 POINT (-53.98586 -30.45562)


# Join information
ras.ds.sf.ll <- rbind(ras.ds1.sf.ll, ras.ds2.sf.ll)
ras.df.ll <- sf_to_df(ras.ds.sf.ll, fill = TRUE, unlist = NULL)


# Mean values by tile 
ras.df.ll %>% 
  group_by(x,y) %>% dplyr::summarise(values=mean(values))
# `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 12 x 3
# # Groups:   x [12]
#        x     y values
#       
#  1 -54.0 -30.5    321
#  2 -54.0 -30.5    249
#  3 -54.0 -30.5    234
#  4 -54.0 -30.5    238
#  5 -54.0 -30.5    224
#  6 -54.0 -30.5    244
#  7 -54.0 -30.5    336
#  8 -54.0 -30.5    355
#  9 -54.0 -30.5    363
# 10 -54.0 -30.5    426
# 11 -54.0 -30.5    341
# 12 -54.0 -30.5    308


# Nothing happened!!

# But if I try to change x and y values using accuracy
round_any = function(x, accuracy, f=round){f(x/ accuracy) * accuracy}
ras.df.ll2 <- ras.df.ll %>%
  mutate(x = round_any(x, accuracy = 0.001),
         y = round_any(y, accuracy = 0.001))
ras.df.ll2 %>% 
  group_by(x,y) %>% dplyr::summarise(values=mean(values))
`summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# # A tibble: 3 x 3
# # Groups:   x [2]
#       x     y values
#      
# 1 -54.0 -30.5   252.
# 2 -54.0 -30.5   370 
# 3 -54.0 -30.5   324.

# Hapens something but is wrong!!

Please, is there any way to make this corrected mean extraction? Thanks in advance Alexandre

Marek Fiołka · Accepted Answer

Let's start by noticing that what you wrote is unnecessarily complicated. I suggest you do it like this.

fst = function(data)
  st_as_sf(data, coords = c("x", "y"),
           crs = data$crs, agr = "constant") %>%
  st_transform(crs=4326) %>%
  sf_to_df(fill = TRUE, unlist = NULL)

df = tibble(
    tile = rep(c("T21JYG", "T22JBM"), each = 6) %>% fct_inorder(),
    values  = c(321,249,234,238,224,244,336,355,363,426,341,308),
    x = c(789385,789395,789405,789415,789425,789435,
          213285,213295,213305,213315,213325,213335),
    y = c(6626865,6626865,6626865,6626865,6626865,6626865,
          6626955,6626955,6626955,6626955,6626955,6626955),
    crs = rep(c(32721, 32722), each = 6)
  ) %>% group_by(tile) %>%
    nest(data=x:crs) %>%
    mutate(st = map(data, ~ fst(.x))) %>%
    unnest(st) %>% 
    mutate(
      x = x %>% plyr::round_any(accuracy = 0.001) %>% paste(),
      y = y %>% plyr::round_any(accuracy = 0.001) %>% paste(),
    ) %>% group_by(x,y)

output

# A tibble: 12 x 8
# Groups:   x, y [3]
   tile   values data               crs sfg_id point_id x       y      
                              
 1 T21JYG    321  32721      1        1 -53.986 -30.456
 2 T21JYG    249  32721      1        1 -53.986 -30.456
 3 T21JYG    234  32721      1        1 -53.986 -30.456
 4 T21JYG    238  32721      1        1 -53.986 -30.456
 5 T21JYG    224  32721      1        1 -53.986 -30.456
 6 T21JYG    244  32721      1        1 -53.986 -30.456
 7 T22JBM    336  32722      1        1 -53.986 -30.455
 8 T22JBM    355  32722      1        1 -53.986 -30.455
 9 T22JBM    363  32722      1        1 -53.986 -30.455
10 T22JBM    426  32722      1        1 -53.986 -30.455
11 T22JBM    341  32722      1        1 -53.985 -30.455
12 T22JBM    308  32722      1        1 -53.985 -30.455

However, I completely do not understand what you see incorrect in it

df %>% dplyr::summarise(values=mean(values))

output

# A tibble: 3 x 3
# Groups:   x [2]
  x       y       values
         
1 -53.985 -30.455   324.
2 -53.986 -30.455   370 
3 -53.986 -30.456   252.

This is exactly what we do, which is the mean of values in the x, y groups. Make clear what you want to achieve. And don't write it in the comments, but in the body of the post!

Hmm. I don't know anything about the functions st_as_sf, st_transform, sf_to_df. I don't know what they do or how to interpret their results. Note, however, that they are placed in a pipeline (in my fst function) so they do the necessary calculations and then pass the results to each other. These results must therefore be correct.

Perhaps the problem is that you originally specified two different crs parameters in the st_as_sf function. I did this by putting this variable in tibble. However, you only pass one value of crs = 4326 to the st_transform function. Perhaps two different values should also be conveyed here?

Finally, I got six results when I set the accuracy in the round_any function to 0.00025.

Take a look at these answers.

# A tibble: 6 x 3
# Groups:   x [6]
  x         y         values
             
1 -53.98525 -30.4555    308 
2 -53.9855  -30.4555    377.
3 -53.98575 -30.4555    312.
4 -53.986   -30.45575   231 
5 -53.98625 -30.45575   242.
6 -53.9865  -30.45575   321

In conclusion, I showed you the correct way to program it.

sf & dplyr: Grouped mean by same coordinates doesn't work

Answers (1)

Related Questions

sf &amp; dplyr: Grouped mean by same coordinates doesn&#39;t work

Answers (1)

Related Questions

sf & dplyr: Grouped mean by same coordinates doesn't work