Franz Hahn
Franz Hahn

Reputation: 332

Creating a square preference matrix from a data.table

I am trying to create a square matrix of preferences or counts (doesn't really matter what it is, really) from data.table entries.

Let's say I have the following data.table to work with:

library(data.table)

segment=c("track","track","track","round","round","sprint","sprint","sprint","sprint")
athlete=c("gunnar","brandon","raphael","gunnar","ben","brandon","raphael","ben","gunnar")
time=c(54,56,57,23,25,15,16,16,17)

df <- data.table(athlete,segment,time)

df[,time_diff:=min(time)-time,by=segment]

df[,winner:=athlete[1],by=segment]

    athlete segment time time_diff  winner
 1:  gunnar   track   54         0  gunnar
 2: brandon   track   56        -2  gunnar
 3: raphael   track   57        -3  gunnar
 4: raphael   round   23         0 raphael
 5:     ben   round   25        -2 raphael
 6: brandon   round   28        -5 raphael
 7: brandon  sprint   15         0 brandon
 8: raphael  sprint   16        -1 brandon
 9:     ben  sprint   19        -4 brandon
10:  gunnar  sprint   26       -11 brandon

names <- unique(df$athlete)

[1] "gunnar"  "brandon" "raphael" "ben" 

Now I'd like to have a square matrix over the athletes that shows their time against the winner of each track, something akin to this:

        gunnar  brandon  raphael  ben
gunnar     0     -11        0      0       
brandon   -2       0       -5      0
raphael   -3      -1        0      0
ben       -2      -4        0      0

In my head I have some ideas to approach this, but nothing seems to be working out. I come from a MATLAB background, and there I'd just iterate, but I feel that's not a data.table approach to it at all.

I feel like I should be able to accomplish it using a foreach iteration over the athletes. Something along the lines of:

foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]

[[1]]
    winner pref
1:  gunnar    0
2: brandon  -11

[[2]]
    winner pref
1:  gunnar   -2
2: raphael   -5
3: brandon    0

[[3]]
    winner pref
1:  gunnar   -3
2: raphael    0
3: brandon   -1

[[4]]
    winner pref
1: raphael   -2
2: brandon   -4

But at this point I am stuck, and unsure as to how to proceed. I have some initial ideas, creating a vector of the appriate lenght vec <- vector(mode="double", length=length(names)) and then indexing it using which(names %in% df[,winner,by=IREALLYDONTKNOW]), but as you can see, I am not clear how to properly approach it.

If anyone would give me some hints as to the correct data.table approach I'd be very grateful.

Upvotes: 1

Views: 133

Answers (2)

Franz Hahn
Franz Hahn

Reputation: 332

The way I solved it was actually fairly easy, after some realization:

names <- unique(df$athlete)

vec <- matrix(data = 0,nrow=length(names),ncol=length(names),dimnames=list(names,names))

pref <- foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]

foreach(n=1:length(names)) %do% (vec[names[n],pref[[n]]$winner] <- pref[[n]]$pref)

> vec
        gunnar brandon raphael ben
gunnar       0     -11       0   0
brandon     -2       0      -5   0
raphael     -3      -1       0   0
ben          0      -4      -2   0

Upvotes: 0

paljenczy
paljenczy

Reputation: 4899

While running your code does not produce the table that is printed, I think what you are looking for is dcast.data.table:

dt_compare <- dcast.data.table(df, athlete ~ winner, value.var = "time_diff")
# add zero columns for athletes that did not win
dt_compare[, setdiff(unique(athlete), names(dt_compare)) := 0]
# you can also reorder columns
setcolorder(dt_compare, c("athlete", dt_compare[["athlete"]]))

Upvotes: 2

Related Questions