stats_noob
stats_noob

Reputation: 5925

Merging a Matrix and a Data Frame in R

I started working with a shapefile in R. In this shapefile, each "boundary" is uniquely defined by a value in "col1" (e.g. ABC111, ABC112 , ABC113, etc.):

library(sf)
library(igraph)
library(spdeb)

sf <- sf::st_read("C:/Users/me/OneDrive/Documents/shape5/myshp.shp", options = "ENCODING=WINDOWS-1252")

head(sf)

Simple feature collection with 6 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7201955 ymin: 927899.4 xmax: 7484015 ymax: 1191414
Projected CRS: PCS_Lambert_Conformal_Conic

       col1   col2  col3                   geometry
620     ABC111    99 Region1 MULTIPOLYGON (((7473971 119...
621     ABC112    99 Region1 MULTIPOLYGON (((7480277 118...
622     ABC113   99 Region1 MULTIPOLYGON (((7477124 118...
627     ABC114    99 Region1 MULTIPOLYGON (((7471697 118...
638     ABC115   99 Region1 MULTIPOLYGON (((7209908 928...
639     ABC116    99 Region1 MULTIPOLYGON (((7206683 937...

> dim(sf)
[1] 500   4

Using this post as a reference (https://gis.stackexchange.com/questions/403315/creating-adjacent-matrix-with-polygons-in-shapefile-with-r), I converted this shapefile into an adjacency matrix and an edge list:

mat <- nb2mat(poly2nb(sf), style = "B")
g1 <- graph_from_adjacency_matrix(mat)
g2 <- as_edgelist(g1)

>g1

IGRAPH 43082db D--- 513 2880 -- 
+ attr: color (v/c)
+ edges from 43082db:
  [1]  1->  3  1->  4  1-> 37  1-> 38  1-> 40  1-> 43  1-> 62  1->126  1->197  2->  3  2-> 24  2-> 37  2-> 38  2->125  3->  1  3->  2  3-> 38  3->125  3->197  3->198  3->241  3->265
 [23]  4->  1  4-> 43  4-> 44  4-> 62  4->126  5->  7  5->408  5->409  5->410  5->478  6->  7  6->150  6->153  6->291  6->411  6->476  7->  5  7->  6  7->168  7->169  7->170  7->291
 [45]  7->410  7->476  7->477  7->478  8-> 11  8-> 21  8->213  8->214  8->454  8->489  8->490  9-> 11  9-> 12  9-> 14  9-> 15  9-> 49  9->159  9->161  9->164  9->211  9->212  9->213
 [67]  9->223  9->324  9->325  9->326  9->336  9->337  9->343  9->379  9->380  9->383  9->384  9->385  9->386  9->387  9->390  9->395  9->396  9->397  9->413  9->461  9->464  9->465
 [89]  9->470  9->471  9->511  9->512 10-> 12 10-> 13 10-> 49 10-> 50 10->210 10->211 10->342 11->  8 11->  9 11-> 14 11->213 11->343 11->380 11->454 11->461 11->490 11->491 11->502
[111] 12->  9 12-> 10 12-> 13 12-> 15 12-> 17 12-> 49 12->354 12->395 12->402 12->513 13-> 10 13-> 12 13-> 50 13->193 13->208 13->342 13->430 13->439 13->503 14->  9 14-> 11 14-> 16
[133] 14-> 17 14->324 14->343 14->344 14->380 14->396 14->414 14->479 14->491 14->502 15->  9 15-> 12 15-> 17 15->324 15->395 15->396 15->413 16-> 14 16-> 17 16-> 18 16-> 19 16->303
[155] 16->310 16->311 16->348 16->349 16->350 16->400 16->401 16->403 16->414 16->479 16->480 16->481 16->482 16->491 17-> 12 17-> 14 17-> 15 17-> 16 17-> 18 17->303 17->310 17->346
+ ... omitted several edges

> head(g2)
     [,1] [,2]
[1,]    1    3
[2,]    1    4
[3,]    1   37
[4,]    1   38
[5,]    1   40
[6,]    1   43

Now, I also have a "Reference Table" that contains a variable for each value of "col1". For example, the number of days it rained each value of "col1":

#simulate data
var1 = rep("ABC",500)
var2 = seq(111, 511, by=1)
rainfall = rnorm(500, 60, 2)
reference = data.frame(col1 = paste0(var1,var2), rainfall)

> head(reference)

    col1 rainfall
1 ABC111 57.09933
2 ABC112 59.41411
3 ABC113 60.71370
4 ABC114 62.04429
5 ABC115 58.30965
6 ABC116 60.35608

I would like to merge the "reference" data frame with the "edge list" or with the "adjacency matrix". My naïve attempt to do this would look something like this:

# create an ID key:
reference$id = 1:nrow(reference)

g2 = data.frame(g2)
g2$id = g2$X1

merged = merge(x = g2 , y = reference, by = "id", all.x = TRUE)

The problem is, I don't know if I have correctly created the ID variable. I just assumed that the order of "col1" in the shapefile is preserved in the same order as in "g2". But I am not sure if this assumption is correct.

Thank you!

Upvotes: 1

Views: 952

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 52239

You can merge using the col1 variable as the ID, so you need to "convert" the first column in your edge list to the col1 ID. The edges are indeed sorted according to the adjacency matrix, so the first object in the first column corresponds to the first element in the adjacency matrix.

First I would suggest using as_data_frame rather than as_edgelist.

g2 <- as_data_frame(g1)

Then, you can get the original IDs of your sf object using sf$col1[g2$from]:

g2$col1 <- sf$col1[g2$from]

Finally, merge the two data frames:

merge(g2, reference)

Because you don't give access to your sf object, I give here a reproducible example using data from the GIS SE example you suggest:

# Data
sf <- sf::st_read("Cuencas_disponibilidad_2021.shp")
mat <- nb2mat(poly2nb(sf), style = "B")
g1 <- graph_from_adjacency_matrix(mat)

#Mock reference dataframe
reference = data.frame(col1 = unique(sf$id_cuenca),
                       rainfall = rnorm(757))


# Create the dataframe from the adjacency matrix
g2 <- as_data_frame(g1)

# Use the original ID values for the `from` column:
g2$col1 <- sf$id_cuenca[g2$from]

# Merge
merge(g2, reference) |> head

#   col1 from to   rainfall
# 1  101    1  2 -0.4494151
# 2  101    1  4 -0.4494151
# 3  101    1  5 -0.4494151
# 4  101    1  6 -0.4494151
# 5  102    4  1  1.6520035
# 6  102    4  6  1.6520035

Upvotes: 3

Related Questions