Fastest Way to Split Data Frame by Group, shuffle single vector in R

Question

I am familiar with some of the split-apply-combine functions in R, like ddply, but I am unsure how to split a data frame, modify a single variable within each subset, and then recombine the subsets. I can do this manually, but there is surely a better way.

In my example, I am trying to shuffle a single variable (but none of the others) within a group. This is for a permutation analysis, so I am doing it many many times, and would thus like to speed things up.

allS <- split(all, f=all$cp)
for(j in 1:length(allS)){
    allS[[j]]$party <- sample(x=allS[[j]]$party)
}
tmpAll <- rbind.fill(allS)

Sample data frame:

all <- data.frame(cp=factor(1:5), party=rep(c("A","B","C","D"), 5))

Thanks for any direction!

akrun · Accepted Answer

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(all)), grouped by 'cp', sample the 'party' and assign (:=) that output back to the 'party' column.

library(data.table)
setDT(all)[, party:= sample(party) , by = cp]

Fastest Way to Split Data Frame by Group, shuffle single vector in R

Answers (2)

Related Questions