user14692575
user14692575

Reputation:

Matching and indexing through two dataframes and one matrix

  1. I have a dataframe events with xy-coords of unique points.
  2. I have a dataframe all_nodes with xy-coords of network nodes. All points of events are also in all_nodes, but not necessarily only once, and at different positions, i.e., the index (row id) of a point in events does not correspond to all_nodes.
  3. I have a matrix ma of dimension nrow(all_nodes) times nrow(all_nodes) with calculated pairwise interaction terms between all nodes. marows and cols correspond with the index (row_ids) of all_nodes.

My overall goal is to identify the row ids of events in all_nodes. With this I am aiming to extract a submatrix of pairwise interaction from my matrix ma according to the detected row ids. Finally I want to change the order of the submtarix such that the ids and correponding points correspond to events. Any kind of help (code/reference/hint) is much appreciated!

Toy data (you can find real data below)

# coords of unique events 
events <- data.frame(x = c(1,2,3,4),
                     y = c(4,3,2,1))
# all_nodes 
all_nodes <- data.frame(x = c(2,1,120,3,150,4,1),
                     y = c(3,4,120,2,150,1,4))
# matrix corresponding to the index of all_nodes
ma <- matrix(data = rnorm(n = 49, mean = 3, sd = 1), 
             nrow = nrow(all_nodes), ncol = nrow(all_nodes))
ma[6, ] <- ma[2, ]

My effort which isn't quite helpful, since I ran in several problems.

# coords of unique events 
events # see toy data

# ------------------------------------------------
# from object g of class  "sfnetwork" "tbl_graph" "igraph" 
# all rounded coords of nodes; from g ma is used 
# in several steps 
# cols and rows in ma correspond to node ids of g/all_nodes

# all_nodes <- g %>% tidygraph::activate("nodes") %>%
# as.data.frame(geometry)
# all_nodes <- as.data.frame(matrix(unlist(all_nodes$geometry), ncol = 2, byrow = TRUE))
# names(all_nodes) <- c('x', 'y')
# all_nodes <- round(all_nodes, 2)
# --------------------------------------------------

# matching based on x-coord only 
ix <- which(all_nodes$x %in% events$x)
# Problem A
length(ix) == nrow(events) # different length
# Problem B
# and the event with coords x=1, y=4 occurs twice in ix 

sub <- ma[ix, ix]
# If problems A+B were eleminated, sub would correspond to 
# all events, but I different indexing makes it unusable  #(several permutations possible)

I also played around with st_equals {sf} to compare geometries directlly using events <- sf::st_as_sf(events[, c('x', 'y')], coords = c('x', 'y')) in a previous step.

Real data

# removed 

Upvotes: 2

Views: 267

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102710

Probably we should do the match task like below

idx <- match(do.call(paste, events), do.call(paste, all_nodes))
ma[idx,idx]

or

idx <- match(asplit(events, 1), asplit(all_nodes, 1))
ma[idx, idx]

Benchmark

TIC1 <- function() {
    match(do.call(paste, events), do.call(paste, all_nodes))
}

TIC2 <- function() {
    match(asplit(events, 1), asplit(all_nodes, 1))
}


GKi <- function() {
    match(interaction(events),interaction(all_nodes))
}

library(bench)
bm <- mark(
    TIC1(),
    TIC2(),
    GKi()
)
autoplot(bm)

gives

> bm
# A tibble: 3 x 13
  expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 TIC1()     175.9us 197.5us     4573.        0B     24.5  2052    11      449ms
2 TIC2()      30.2us  32.1us    28884.        0B     14.4  9995     5      346ms
3 GKi()      311.2us 349.1us     2741.    1.53KB     27.1  1212    12      442ms
# ... with 4 more variables: result <list>, memory <list>, time <list>,
#   gc <list>

and

enter image description here

Upvotes: 2

GKi
GKi

Reputation: 39727

interaction could be used to match on multiple columns.

idx <- match(interaction(events), interaction(all_nodes))
ma[idx,idx]

Upvotes: 3

Related Questions