Reputation: 23
I have a df with three variable like this:
ORI DEST PAX
1 bog mde 260
2 mde bog 240
3 clo bog 120
4 bog ctg 50
5 ctg bog 30
6 bog clo 100
I want to sum the equal rows of ORI and DEST. For example, row 1 and row 2 are equal because bog-mde is equal than mde-bog. So PAX equal to 260 + 240 = 500. I want to have a output like this:
ORI_DEST PAX
1 bog-mde-bog 500
2 clo-bog-clo 220
3 bog-ctg-bog 80
How you can see, match and sum row 1 with row 2, row 3 with 6 and row 4 with 5. This is similar to mix a vlookup with sum. Please, help me guys.
Upvotes: 1
Views: 47
Reputation: 42582
Alternatively, the pmin()
and pmax()
functions can be used to construct a ROUTE
identifier to aggregate by:
aggregate(df["PAX"],
list(ROUTE = with(df, sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST)))),
sum)
ROUTE PAX 1 bog-clo 220 2 bog-ctg 80 3 bog-mde 500
library(dplyr)
df %>%
group_by(ROUTE = sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST))) %>%
summarise(PAX = sum(PAX))
# A tibble: 3 x 2 ROUTE PAX <chr> <int> 1 bog-clo 220 2 bog-ctg 80 3 bog-mde 500
library(data.table)
setDT(df)
df[, .(PAX = sum(PAX)), keyby = .(ROUTE = sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST)))]
ROUTE PAX 1: bog-clo 220 2: bog-ctg 80 3: bog-mde 500
Instead of
sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST))
we can use
paste(pmin(ORI, DEST), pmax(ORI, DEST), sep = "-")
library(data.table)
df <- fread("rn ORI DEST PAX
1 bog mde 260
2 mde bog 240
3 clo bog 120
4 bog ctg 50
5 ctg bog 30
6 bog clo 100", data.table = FALSE)
Upvotes: 0
Reputation: 43
aggregate(df["PAX"],
list(ORI_DEST = apply(df[1:2], 1, function(x)
paste(sort(x), collapse = ", "))),
sum)
# ORI_DEST PAX
#1 bog, clo 220
#2 bog, ctg 80
#3 bog, mde 500
Or use igraph
library(igraph)
g = graph.data.frame(df[c("ORI", "DEST")], directed = FALSE)
E(g)$weight = df$PAX
g2 = simplify(g, edge.attr.comb="sum")
data.frame(get.edgelist(g2), PAX = E(g2)$weight)
# X1 X2 PAX
#1 bog mde 500
#2 bog clo 220
#3 bog ctg 80
Upvotes: 4