Reputation: 122
I have data on movement, from and to, and would like to say something about a location, for example, how many things come and go from location 1.
test = data.table(day = c(1,1,1), from = c(1,2,1), to =c(3,1,3))
test[ , `:=`( total_from = .N) , by = c("day", "from") ]
test[ , `:=`( total_to = .N) , by = c("day", "to") ]
day from to total_from total_to
1: 1 1 3 2 2
2: 1 2 1 1 1
3: 1 1 3 2 2
I can not simply add both total columns as they do not related to the same location. My preferable outcome would be:
day location count
1: 1 1 3
2: 1 2 1
3: 1 3 2
I don't know what to search for, as I am quite certain R has some function for this, so maybe if someone could send me in the right direction that would be very helpful.
Upvotes: 2
Views: 40
Reputation: 13833
What you have is graph data. Each location is called either a vertex or a node (they are interchangeable), and the quantity you are trying to compute is called the degree of a node. The "in degree" is the total amount flowing in, and the "out degree" is the total amount flowing out; the "degree" overall is the sum of the two.
There are two solutions possible:
I'll demonstrate both.
The first solution uses the "reshape" functionality from data.table
, since you're already using that package:
test <- data.table(
day = c(1,1,1),
from = c(1,2,1),
to = c(3,1,3)
)
Your data is currently in "wide" format: two of your columns are variations of each other, the amount that flows in or out of the node on a given day.
test_long <- melt(test, id.vars = "day", variable.name = "direction", value.name = "location")
melt
reshapes this data into "long" format: you have one column indicating the location, and another indicating the direction. Now you are free to obtain your answer by simply grouping on day and location and computing the number of instances of each group:
test_totals <- test_long[, .N, by = .(day, location)]
Fortunately, data.table
is very efficient for "groupby" operations.
However, graph data structures are generally more flexible and efficient. igraph is a powerful and easy-to-use graph analysis package, backed by a well-built C library.
library(igraph)
test <- data.table(
day = c(1,1,1),
from = c(1,2,1),
to = c(3,1,3)
)
# the first 2 columns must be the vertices; each row is an edge
# other columns are treated as edge attributes
g <- graph.data.frame(test[, .(from, to, day)]
# returns a named vector of degree for each node
totals <- degree(g)
Upvotes: 3