Michael Davidson
Michael Davidson

Reputation: 1411

Fastest Way to Split Data Frame by Group, shuffle single vector in R

I am familiar with some of the split-apply-combine functions in R, like ddply, but I am unsure how to split a data frame, modify a single variable within each subset, and then recombine the subsets. I can do this manually, but there is surely a better way.

In my example, I am trying to shuffle a single variable (but none of the others) within a group. This is for a permutation analysis, so I am doing it many many times, and would thus like to speed things up.

allS <- split(all, f=all$cp)
for(j in 1:length(allS)){
    allS[[j]]$party <- sample(x=allS[[j]]$party)
}
tmpAll <- rbind.fill(allS)

Sample data frame:

all <- data.frame(cp=factor(1:5), party=rep(c("A","B","C","D"), 5))

Thanks for any direction!

Upvotes: 3

Views: 1439

Answers (2)

Ven Yao
Ven Yao

Reputation: 3710

The dplyr way.

library(dplyr)
all %>% group_by(cp) %>% mutate(party=sample(party))

Upvotes: 2

akrun
akrun

Reputation: 887118

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(all)), grouped by 'cp', sample the 'party' and assign (:=) that output back to the 'party' column.

library(data.table)
setDT(all)[, party:= sample(party) , by = cp]

Upvotes: 4

Related Questions