Reputation: 640
My data set looks like this:
group <- c(1,2,3,4,5,6,7)
item1.sep <- sample(1:10,7)
item2.sep <- sample(1:10,7)
item3.sep<- sample(1:10,7)
item4.sep<- sample(1:10,7)
item5.sep<- sample(1:10,7)
item1.oct<- sample(1:10,7)
item2.oct <- sample(1:10,7)
item3.oct<- sample(1:10,7)
item4.oct<- sample(1:10,7)
item5.oct<- sample(1:10,7)
df <- data.frame(group,item1.sep,item2.sep,item3.sep,item4.sep,item5.sep,
item1.oct,item2.oct,item3.oct,item4.oct,item5.oct)
group item1.sep item2.sep.... item5.oct
1 9 9 4
2 4 4 7
3 7 7 2
4 3 8 5
5 8 3 1
6 6 10 8
7 10 2 6
And I want to create 2 new columns containing the top 2 item as character for each month. Maybe a combination of max and merge would help in this one
Top2_Sept Top2_Oct group.... item5.oct
item3.sep,item2.sep item5.Oct,item2.Oct 1 9
item4.sep,item1.sep . 2 4
item2.sep,item5.sep . . .
item4.sep,item2.sep . . .
item1.sep,item3.sep . . .
item2.sep,item5.sep . . .
item4.sep,item1.sep . .
Upvotes: 0
Views: 71
Reputation: 59425
Here's a data.table solution.
library(data.table)
DT <- as.data.table(df)
DT[,Top2_Sept:=paste(names(.SD)[order(unlist(.SD),decreasing=TRUE)[1:2]],collapse=","),
.SDcols=2:6,by=group]
DT[,Top2_Oct:=paste(names(.SD)[order(unlist(.SD),decreasing=TRUE)[1:2]],collapse=","),
.SDcols=7:11,by=group]
DT[,list(group,Top2_Sept,Top2_Oct)]
# group Top2_Sept Top2_Oct
# 1: 1 item5.sep,item3.sep item2.oct,item1.oct
# 2: 2 item2.sep,item4.sep item1.oct,item5.oct
# 3: 3 item5.sep,item3.sep item3.oct,item2.oct
# 4: 4 item4.sep,item1.sep item5.oct,item1.oct
# 5: 5 item2.sep,item4.sep item4.oct,item5.oct
# 6: 6 item1.sep,item2.sep item1.oct,item2.oct
# 7: 7 item1.sep,item2.sep item1.oct,item2.oct
Your example is not quite reproducible because you did not set.seed(...)
prior to creating the random samples. The result above will reproduce if you set.seed(1)
at the beginning.
Also, there is an ambiguity in your rules. Suppose item1.sep:item5.sep
for a given row is (8,7,7,6,5)
. Then the top item is in the first col, but the second item could be in the second or third column. You present no rule to resolve that.
Upvotes: 2
Reputation: 887891
Try
lst1 <- split(colnames(df)[-1],sub(".*\\.", '',colnames(df)[-1]))
df[paste("Top2", c("Oct", "Sept"), sep="_")] <- lapply(lst1, function(x) {
nm1 <- colnames(df[x])
apply(df[x], 1, function(.x)
toString(nm1[order(.x, decreasing=TRUE)[1:2]]))})
Or
lst1 <- lapply(month.abb[9:10], function(x)
df[grep(x, colnames(df), ignore.case=TRUE)])
nm1 <- lapply(lst1, colnames)
f1 <- function(x,y) apply(x, 1, function(.x)
toString(y[order(.x, decreasing=TRUE)[1:2]]))
df[paste('Top2', month.abb[9:10], sep="_")] <- Map(f1, lst1, nm1)
Upvotes: 1