Reputation: 332
I am trying to create a square matrix of preferences or counts (doesn't really matter what it is, really) from data.table entries.
Let's say I have the following data.table
to work with:
library(data.table)
segment=c("track","track","track","round","round","sprint","sprint","sprint","sprint")
athlete=c("gunnar","brandon","raphael","gunnar","ben","brandon","raphael","ben","gunnar")
time=c(54,56,57,23,25,15,16,16,17)
df <- data.table(athlete,segment,time)
df[,time_diff:=min(time)-time,by=segment]
df[,winner:=athlete[1],by=segment]
athlete segment time time_diff winner
1: gunnar track 54 0 gunnar
2: brandon track 56 -2 gunnar
3: raphael track 57 -3 gunnar
4: raphael round 23 0 raphael
5: ben round 25 -2 raphael
6: brandon round 28 -5 raphael
7: brandon sprint 15 0 brandon
8: raphael sprint 16 -1 brandon
9: ben sprint 19 -4 brandon
10: gunnar sprint 26 -11 brandon
names <- unique(df$athlete)
[1] "gunnar" "brandon" "raphael" "ben"
Now I'd like to have a square matrix over the athletes that shows their time against the winner of each track, something akin to this:
gunnar brandon raphael ben
gunnar 0 -11 0 0
brandon -2 0 -5 0
raphael -3 -1 0 0
ben -2 -4 0 0
In my head I have some ideas to approach this, but nothing seems to be working out. I come from a MATLAB background, and there I'd just iterate, but I feel that's not a data.table
approach to it at all.
I feel like I should be able to accomplish it using a foreach
iteration over the athletes. Something along the lines of:
foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]
[[1]]
winner pref
1: gunnar 0
2: brandon -11
[[2]]
winner pref
1: gunnar -2
2: raphael -5
3: brandon 0
[[3]]
winner pref
1: gunnar -3
2: raphael 0
3: brandon -1
[[4]]
winner pref
1: raphael -2
2: brandon -4
But at this point I am stuck, and unsure as to how to proceed. I have some initial ideas, creating a vector of the appriate lenght vec <- vector(mode="double", length=length(names))
and then indexing it using which(names %in% df[,winner,by=IREALLYDONTKNOW])
, but as you can see, I am not clear how to properly approach it.
If anyone would give me some hints as to the correct data.table
approach I'd be very grateful.
Upvotes: 1
Views: 133
Reputation: 332
The way I solved it was actually fairly easy, after some realization:
names <- unique(df$athlete)
vec <- matrix(data = 0,nrow=length(names),ncol=length(names),dimnames=list(names,names))
pref <- foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]
foreach(n=1:length(names)) %do% (vec[names[n],pref[[n]]$winner] <- pref[[n]]$pref)
> vec
gunnar brandon raphael ben
gunnar 0 -11 0 0
brandon -2 0 -5 0
raphael -3 -1 0 0
ben 0 -4 -2 0
Upvotes: 0
Reputation: 4899
While running your code does not produce the table that is printed, I think what you are looking for is dcast.data.table
:
dt_compare <- dcast.data.table(df, athlete ~ winner, value.var = "time_diff")
# add zero columns for athletes that did not win
dt_compare[, setdiff(unique(athlete), names(dt_compare)) := 0]
# you can also reorder columns
setcolorder(dt_compare, c("athlete", dt_compare[["athlete"]]))
Upvotes: 2