Reputation: 63
Here is the example data:
df <- data.frame("ID1" = c("A","A","B","C"),
"Wt1" = c(0.8,0.6,0.4,0.5),
"ID2" = c("B","A","C","B"),
"Wt2" = c(0.1,0.4,0.5,0.5),
"ID3" = c("C",NA,"C",NA),
"Wt3" = c(0.1,NA,0.1,NA))
And I would like to create a columns (vote) in the dataframe which is based on the argmax of wt from the groups of ID1, ID2, ID3. For example, in row 3 of the example data, sum of wt for "B" is 0.4 and sum of wt for "C" is 0.6, so vote = "C".
So the result will be like
ID1 Wt1 ID2 Wt2 ID3 Wt3 vote
1 A 0.8 B 0.1 C 0.1 A
2 A 0.6 A 0.4 <NA> NA A
3 B 0.4 C 0.5 C 0.1 C
4 C 0.5 B 0.5 <NA> NA C
In the case of tie (row 4 in the example), just select any of the ID values.Can someone suggest a solution?
Upvotes: 4
Views: 68
Reputation: 2539
First, it's very difficult to manipulate tables formatted like this. This is not your desired output but I'm afraid you might be stuck further down the road.
One suggestion is to format the table so that we can easily retrieve information out of it.
Assign id to each observation
df$obs <- 1:nrow(df)
Then put them in a long format
df1 <- do.call("rbind",lapply(seq(1,6,2),function(x) {df <- df[,c(x: (x+1),7)];
colnames(df) <- c("ID","Wt","obs"); df}))
Then my skills with data.frame
are limited so I switch to data.table
package.
dt <- as.data.table(df1)
we sum the number of votes by obs and ID
dt[,total:=sum(Wt,na.rm=TRUE),.(obs,ID)]
Then it's very easy to retrieve the information.
dt[,vote:=.SD[which.max(total)],obs]
#dt
# ID Wt obs total vote
# 1: A 0.8 1 0.8 A
# 2: A 0.6 2 1.0 A
# 3: B 0.4 3 0.4 C
# 4: C 0.5 4 0.5 C
# 5: B 0.1 1 0.1 A
# 6: A 0.4 2 1.0 A
# 7: C 0.5 3 0.6 C
# 8: B 0.5 4 0.5 C
# 9: C 0.1 1 0.1 A
# 10: NA NA 2 0.0 A
# 11: C 0.1 3 0.6 C
# 12: NA NA 4 0.0 C
Upvotes: 1