Wei Wang
Wei Wang

Reputation: 63

R: how do you do row sums based on the grouping from other variables?

Here is the example data:

df <- data.frame("ID1" = c("A","A","B","C"), 
            "Wt1" = c(0.8,0.6,0.4,0.5),
            "ID2" = c("B","A","C","B"),
            "Wt2" = c(0.1,0.4,0.5,0.5),
            "ID3" = c("C",NA,"C",NA), 
            "Wt3" = c(0.1,NA,0.1,NA))

And I would like to create a columns (vote) in the dataframe which is based on the argmax of wt from the groups of ID1, ID2, ID3. For example, in row 3 of the example data, sum of wt for "B" is 0.4 and sum of wt for "C" is 0.6, so vote = "C".

So the result will be like

  ID1 Wt1 ID2 Wt2  ID3 Wt3 vote
1   A 0.8   B 0.1    C 0.1    A
2   A 0.6   A 0.4 <NA>  NA    A
3   B 0.4   C 0.5    C 0.1    C
4   C 0.5   B 0.5 <NA>  NA    C

In the case of tie (row 4 in the example), just select any of the ID values.Can someone suggest a solution?

Upvotes: 4

Views: 68

Answers (1)

DJJ
DJJ

Reputation: 2539

First, it's very difficult to manipulate tables formatted like this. This is not your desired output but I'm afraid you might be stuck further down the road.

One suggestion is to format the table so that we can easily retrieve information out of it.

Assign id to each observation

df$obs <- 1:nrow(df)

Then put them in a long format

  df1 <- do.call("rbind",lapply(seq(1,6,2),function(x) {df <- df[,c(x: (x+1),7)]; 
colnames(df) <- c("ID","Wt","obs"); df}))

Then my skills with data.frame are limited so I switch to data.table package.

dt <- as.data.table(df1)

we sum the number of votes by obs and ID

dt[,total:=sum(Wt,na.rm=TRUE),.(obs,ID)]

Then it's very easy to retrieve the information.

dt[,vote:=.SD[which.max(total)],obs]

#dt
#    ID  Wt obs total vote
# 1:  A 0.8   1   0.8    A
# 2:  A 0.6   2   1.0    A
# 3:  B 0.4   3   0.4    C
# 4:  C 0.5   4   0.5    C
# 5:  B 0.1   1   0.1    A
# 6:  A 0.4   2   1.0    A
# 7:  C 0.5   3   0.6    C
# 8:  B 0.5   4   0.5    C
# 9:  C 0.1   1   0.1    A
# 10: NA  NA   2   0.0    A
# 11:  C 0.1   3   0.6    C
# 12: NA  NA   4   0.0    C

Upvotes: 1

Related Questions