Creating Column in R Containing the top n highest row

Question

My data set looks like this:

    group <- c(1,2,3,4,5,6,7)
    item1.sep <- sample(1:10,7)
    item2.sep <- sample(1:10,7)
    item3.sep<- sample(1:10,7)
    item4.sep<- sample(1:10,7)
    item5.sep<- sample(1:10,7)
    item1.oct<- sample(1:10,7)
    item2.oct <- sample(1:10,7)
    item3.oct<- sample(1:10,7)
    item4.oct<- sample(1:10,7)
    item5.oct<- sample(1:10,7)

df <- data.frame(group,item1.sep,item2.sep,item3.sep,item4.sep,item5.sep,
                 item1.oct,item2.oct,item3.oct,item4.oct,item5.oct)


    group   item1.sep   item2.sep....   item5.oct
       1       9           9             4  
       2       4           4             7  
       3       7           7             2  
       4       3           8             5  
       5       8           3             1  
       6       6          10             8  
       7      10           2             6

And I want to create 2 new columns containing the top 2 item as character for each month. Maybe a combination of max and merge would help in this one

          Top2_Sept   Top2_Oct             group....    item5.oct
 item3.sep,item2.sep  item5.Oct,item2.Oct      1         9  
 item4.sep,item1.sep       .                   2         4          
 item2.sep,item5.sep       .                   .         .
 item4.sep,item2.sep       .                   .         .
 item1.sep,item3.sep       .                   .         .
 item2.sep,item5.sep       .                   .         .
 item4.sep,item1.sep       .                   .

jlhoward · Accepted Answer

Here's a data.table solution.

library(data.table)
DT <- as.data.table(df)
DT[,Top2_Sept:=paste(names(.SD)[order(unlist(.SD),decreasing=TRUE)[1:2]],collapse=","),
          .SDcols=2:6,by=group]
DT[,Top2_Oct:=paste(names(.SD)[order(unlist(.SD),decreasing=TRUE)[1:2]],collapse=","),
          .SDcols=7:11,by=group]
DT[,list(group,Top2_Sept,Top2_Oct)]
#    group           Top2_Sept            Top2_Oct
# 1:     1 item5.sep,item3.sep item2.oct,item1.oct
# 2:     2 item2.sep,item4.sep item1.oct,item5.oct
# 3:     3 item5.sep,item3.sep item3.oct,item2.oct
# 4:     4 item4.sep,item1.sep item5.oct,item1.oct
# 5:     5 item2.sep,item4.sep item4.oct,item5.oct
# 6:     6 item1.sep,item2.sep item1.oct,item2.oct
# 7:     7 item1.sep,item2.sep item1.oct,item2.oct

Your example is not quite reproducible because you did not set.seed(...) prior to creating the random samples. The result above will reproduce if you set.seed(1) at the beginning.

Also, there is an ambiguity in your rules. Suppose item1.sep:item5.sep for a given row is (8,7,7,6,5). Then the top item is in the first col, but the second item could be in the second or third column. You present no rule to resolve that.

Creating Column in R Containing the top n highest row

Answers (2)

Related Questions