How can I find and sum repeated values of two colums?

I have a df with three variable like this:

  ORI  DEST  PAX   
1 bog  mde   260
2 mde  bog   240
3 clo  bog   120
4 bog  ctg   50
5 ctg  bog   30
6 bog  clo   100

I want to sum the equal rows of ORI and DEST. For example, row 1 and row 2 are equal because bog-mde is equal than mde-bog. So PAX equal to 260 + 240 = 500. I want to have a output like this:

  ORI_DEST       PAX   
1 bog-mde-bog   500
2 clo-bog-clo   220
3 bog-ctg-bog   80

How you can see, match and sum row 1 with row 2, row 3 with 6 and row 4 with 5. This is similar to mix a vlookup with sum. Please, help me guys.

Upvotes: 1

Views: 47

Answers (2)

Uwe
Uwe

Reputation: 42582

Alternatively, the pmin() and pmax() functions can be used to construct a ROUTE identifier to aggregate by:

base R

aggregate(df["PAX"], 
          list(ROUTE = with(df, sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST)))),
          sum)
    ROUTE PAX
1 bog-clo 220
2 bog-ctg  80
3 bog-mde 500

dplyr

library(dplyr)
df %>% 
  group_by(ROUTE = sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST))) %>% 
  summarise(PAX = sum(PAX))
# A tibble: 3 x 2
  ROUTE     PAX
  <chr>   <int>
1 bog-clo   220
2 bog-ctg    80
3 bog-mde   500

data.table

library(data.table)
setDT(df)
df[, .(PAX = sum(PAX)), keyby = .(ROUTE = sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST)))]
     ROUTE PAX
1: bog-clo 220
2: bog-ctg  80
3: bog-mde 500

Variant

Instead of

sprintf("%s-%s", pmin(ORI, DEST), pmax(ORI, DEST))

we can use

paste(pmin(ORI, DEST), pmax(ORI, DEST), sep = "-")

Data

library(data.table)
df <- fread("rn  ORI  DEST  PAX   
1 bog  mde   260
2 mde  bog   240
3 clo  bog   120
4 bog  ctg   50
5 ctg  bog   30
6 bog  clo   100", data.table = FALSE)

Upvotes: 0

Dasd
Dasd

Reputation: 43

aggregate(df["PAX"],
          list(ORI_DEST = apply(df[1:2], 1, function(x)
              paste(sort(x), collapse = ", "))),
          sum)
#  ORI_DEST PAX
#1 bog, clo 220
#2 bog, ctg  80
#3 bog, mde 500

Or use igraph

library(igraph)
g = graph.data.frame(df[c("ORI", "DEST")], directed = FALSE)
E(g)$weight = df$PAX
g2 = simplify(g, edge.attr.comb="sum")
data.frame(get.edgelist(g2), PAX = E(g2)$weight)
#   X1  X2 PAX
#1 bog mde 500
#2 bog clo 220
#3 bog ctg  80

Upvotes: 4

Related Questions