Marc
Marc

Reputation: 661

Sum for unique combinations of variables in a data table

I have a data table with the following format representing relations strength between countries for many categories of relations:

Country1    Country2     Value     Category
A           A            4         1
A           B            2         1
A           C            9         1
B           A            3         2
B           D            4         1
C           A            2         2
D           C            7         2
...

Now I would like to sum all mutual relations (e.g. A-B and B-A; D-C and C-D, etc...) for each category (A-B and B-A needs to be "merged").

What could be a concise and "very R" solution to do it? Is there any existing function that could do this?

For now, I've set a key with both "Country1" and "Country2" columns but didn't find what I could do next for matching corresponding rows..

Thanks for any clue.

Upvotes: 1

Views: 1242

Answers (3)

Henk
Henk

Reputation: 3656

# x = your data as data.table
x[,  rel :=paste(fromCountry, toCountry, sep = "-")]
x[, .(sums = sum(Value)), by = rel]

# if fromCountry - toCountry is considered to be bidirectional, then make sets:

library(Kmisc)
x[, sets := Kmisc::str_sort(rel)]
x[, .(sum = sum(Value)), by = sets]

Upvotes: 0

Arun
Arun

Reputation: 118779

Use pmin and pmax..

require(data.table) # v1.9.6
dt = fread("Country1    Country2     Value     Category
A           A            4         1
A           B            2         1
A           C            9         1
B           A            3         2
B           D            4         1
C           A            2         2
D           C            7         2")
dt[, .(total = sum(Value)), 
     by=.(Country1 = pmin(Country1, Country2), 
          Country2 = pmax(Country1, Country2))]
#    Country1 Country2 total
# 1:        A        A     4
# 2:        A        B     5
# 3:        A        C    11
# 4:        B        D     4
# 5:        C        D     7

If you want this within Category, just add it as well to by.

Upvotes: 7

Backlin
Backlin

Reputation: 14842

How you "sum all mutual bipartite relations" depends on what you want to do.

To count the number of relations in each category:

x[, .N, by=Category]

To sum up the value for all relationships in each category:

x[, sum(Value), by=Category]

or this for prettier output:

x[, list(TotalValue = sum(Value)), by=Category]

Upvotes: 1

Related Questions