Hjalmar
Hjalmar

Reputation: 122

Matching come and go to location data

I have data on movement, from and to, and would like to say something about a location, for example, how many things come and go from location 1.

test = data.table(day = c(1,1,1), from = c(1,2,1), to =c(3,1,3))
test[ , `:=`( total_from = .N) , by = c("day", "from") ]
test[ , `:=`( total_to = .N) , by = c("day", "to") ]


   day from to total_from total_to
1:   1    1  3          2        2
2:   1    2  1          1        1
3:   1    1  3          2        2

I can not simply add both total columns as they do not related to the same location. My preferable outcome would be:

   day location count
1:   1        1     3
2:   1        2     1
3:   1        3     2

I don't know what to search for, as I am quite certain R has some function for this, so maybe if someone could send me in the right direction that would be very helpful.

Upvotes: 2

Views: 40

Answers (1)

shadowtalker
shadowtalker

Reputation: 13833

What you have is graph data. Each location is called either a vertex or a node (they are interchangeable), and the quantity you are trying to compute is called the degree of a node. The "in degree" is the total amount flowing in, and the "out degree" is the total amount flowing out; the "degree" overall is the sum of the two.

There are two solutions possible:

  1. Compute in-degree and out-degree by reshaping your data and summing, as in the comment by Jaap
  2. Use a graph library to construct the graph and compute degree efficiently.

I'll demonstrate both.

Reshaping

The first solution uses the "reshape" functionality from data.table, since you're already using that package:

test <- data.table(
    day = c(1,1,1),
    from = c(1,2,1),
    to = c(3,1,3)
)

Your data is currently in "wide" format: two of your columns are variations of each other, the amount that flows in or out of the node on a given day.

test_long <- melt(test, id.vars = "day", variable.name = "direction", value.name = "location")

melt reshapes this data into "long" format: you have one column indicating the location, and another indicating the direction. Now you are free to obtain your answer by simply grouping on day and location and computing the number of instances of each group:

test_totals <- test_long[, .N, by = .(day, location)]

Graph analysis

Fortunately, data.table is very efficient for "groupby" operations.

However, graph data structures are generally more flexible and efficient. igraph is a powerful and easy-to-use graph analysis package, backed by a well-built C library.

library(igraph)


test <- data.table(
    day = c(1,1,1),
    from = c(1,2,1),
    to = c(3,1,3)
)

# the first 2 columns must be the vertices; each row is an edge
# other columns are treated as edge attributes
g <- graph.data.frame(test[, .(from, to, day)]

# returns a named vector of degree for each node
totals <- degree(g)

Upvotes: 3

Related Questions