RxT
RxT

Reputation: 548

Mean distance from one city to others by a mutual ID

I don't understand spatial.data at all. I have been studying but I'm missing something.

What I have: data.frame enterprises with the columns: id, parent_subsidiary, city_cod.

What I need: the mean and the max distance from the parent's city to the subsidiary cities.

Ex:

    id         |     mean_dist     | max_dist
 1111          |         25km      |     50km    
 232           |        110km      |    180km  
 333           |          0km      |      0km  

What I did :

library("tidyverse")
library("sf")
# library("brazilmaps")   not working anymore
library("geobr")


parent <- enterprises %>% filter(parent_subsidiary==1) 
subsidiary <- enterprises %>% filter(parent_subsidiary==2) 

# Cities - polygons 
m_city_br <- read_municipality(code_muni="all", year=2019)

# or shp_city<- st_read("/BR_Municipios_2019.shp")

# data.frame with the column geom
map_parent  <- left_join(parent, m_city_br, by=c("city_cod"="code_muni"))
map_subsidiary <- left_join(subsidiary, m_city_br, by=c("city_cod"="code_muni"))



st_distance(map_parent$geom[1],map_subsidiary$geom[2]) %>% units::set_units(km)
# it took a long time and the result is different from google.maps
# is it ok?!


# To do by ID -- I also stucked here

distance_p_s <- data.frame(id=as.numeric(),subsidiar=as.numeric(),mean_dist=as.numeric(),max_dist=as.numeric())

id_v <- as.vector(parent$id)



for (i in 1:length(id_v)){
  
 
  test_p <- map_parent %>% filter(id==id_v[i])  
  test_s <- map_subsidiary %>% filter(id==id_v[i])
  total <- 0
  value <- 0
  max <- 0
  l <- 0
  
  l <- nrow(test_s)

      for (j in 1:l){

         value <- as.numeric(round(st_distance(test_p$geom[1],test_s$geom[j]) %>% units::set_units(km),2))
          
         total <- total + value
         ifelse(value>max,max<-value,NA)
      }
  

  mean_dist <- total/l
  done <- data.frame(id=id[i],subsidiary=l,mean_dist=round(mean_dist,2),max_dist=max)
  distance_p_s <- rbind(distance_p_s,done)
  
  rm(done)
  
}
}



Is it right? Can I calculate the centroid of the cities and than calculate the distance?

I realize that the distance from code_muni==4111407 to code_muni==4110102, the distance is 0, but is another city (Imbituva, PR,Brasil - Ivaí, PR,Brasil). Why?

Data example: structure(list(id = c("1111", "1111", "1111", "1111", "232", "232", "232", "232", "3123", "3123", "4455", "4455", "686", "333", "333", "14112", "14112", "14112", "3633", "3633"), parent_subsidiary = c("1","2", "2", "2", "1", "2", "2", "2", "1", "2", "1", "2", "1", "2", "1", "1", "2", "2", "1", "2"), city_cod = c(4305801L,4202404L, 4314803L, 4314902L, 4318705L, 1303403L, 4304507L, 4314100L, 2408102L, 3144409L, 5208707L, 4205407L, 5210000L, 3203908L, 3518800L, 3118601L, 4217303L, 3118601L, 5003702L, 5205109L)), row.names = c(NA, 20L), class = "data.frame")

PS: this is Brazilian cities https://github.com/ipeaGIT/geobr/tree/master/r-package

Upvotes: 0

Views: 77

Answers (2)

RxT
RxT

Reputation: 548

I did something like that:


distance_p_s <- data.frame(id=as.character(),
                            qtd_subsidiary=as.numeric(),
                            dist_min=as.numeric(),
                            dist_media=as.numeric(),
                            dist_max=as.numeric())

id <- as.vector(mparentid$id)

for (i in 1:length(id)){
  
  eval(parse(text=paste0("
                         print('Filtering id: ",id[i]," (",i," of ",length(id),")')
                         ")))
  teste_m <- mparentid %>% filter(id==id[i]) %>% st_as_sf()
  teste_f <- msubsidiaryid %>% filter(id==id[i]) %>% st_as_sf()
  
  teste_f <- st_centroid(teste_f)
  teste_m <- st_centroid(teste_m)
  
  teste_f = st_transform(teste_f, 4674)
  teste_m = st_transform(teste_m, 4674)
  
  total <- 0
  value <- 0
  min <- 0
  max <- 0
  l <- 0
  
  l <- nrow(teste_f)
  
  for (j in 1:l){
    
    eval(parse(text=paste0("
                         print('Tratando id: ",id[i]," (",i," de ",length(id),"), subsidiary: ",j," de ",l,"')
                         ")))
    
    value <- as.numeric(round(st_distance(teste_m$geom[1],teste_f$geom[j]) %>% units::set_units(km),2))
    
    total <- total + value
    ifelse(value>max,max<-value,NA)
    if(j==1){
      min<-value
    } else { 
      ifelse(value<min,min<-value,NA)}
  }
  
  
  dist_med <- total/l
  done <- data.frame(id=id[i],qtd_subsidiary=l,dist_min=min,dist_media=round(dist_med,2),dist_max=max)
  distance_p_s <- rbind(distance_p_s,done)
  
  eval(parse(text=paste0("
                         print('Concluido id: ",id[i]," (",i," de ",length(id),"), subsidiary: ",j," de ",l,"')
                         ")))
  
  rm(done)
  
}

Probably this is not the best way, but it solved my problem for now.

Upvotes: 0

Gray
Gray

Reputation: 1388

Great problem. I looked at it for a little while. Then I came back and looked some more after thinking about it. The mean was not calculated. Only the distances were determined from each parent to its subsidiaries.

The data was binded - the cities data and the data frame data. Then the new df was mutated to add the centroid data for each point on the surface.

The df was split by id and resulted in a list of 8 df's. Each df contained separate parent with related subsidiaries. (1:4, 1:3, 1:4, 1:2, .... )

A loop with a function cleaned up the 8 df's, and calculated the distance from each parent to each subsidiary.

I checked the distance of the first df in the list against values for distances from a website. The distances of df1 were nearly identical to the website.

The output is shown at [link]

Upvotes: 1

Related Questions