Rad
Rad

Reputation: 1029

R : data frame Randomize columns by row

I have a dataframe in R that I want to randomize, keeping the first column like it is but randomizing the last two columns together, so that values that appear in the same rows in these columns will appear in the same row both after randomizing. So if I started with this:

1 a b c 
2 d e f 
3 g h i 

when randomized it might look like:

1 a e f 
2 d h i 
3 g b c 

I know that sample works fine but does it conserve the columns equivalence?

Upvotes: 2

Views: 1416

Answers (4)

Etienne Low-Décarie
Etienne Low-Décarie

Reputation: 13443

Approach using colwise in plyr for elegant column wise permutation:

test <- data.frame(matrix(nrow=4,ncol=10,data=1:40))

Load plyr

require(plyr)

Creat a column wise "sample" function

colwise.sample <- colwise(sample)

Apply to the desired rows

permutation.test <- test
permutation.test[,c(1,3,4)] <- colwise.sample(test[,c(1,3,4)])

Upvotes: 0

Matt Bannert
Matt Bannert

Reputation: 28264

What do you mean by "values equivalence"? Honestly I do not get the message, but here's my guess. As you said, you could use sample, but use it separately on the on your columns, e.g. by apply:

 # create a reproducible example
 test <- data.frame(indx=c(1,2,3),col1=c("a","d","g"),
               col2=c("b","e","h"),col3=c("c","f","i"))

 xyz <- apply(test[,-1],MARGIN=2,sample)
 as.data.frame(xyz)

Upvotes: 0

John Colby
John Colby

Reputation: 22588

Just sample one column at a time and you'll be fine. For example:

data[,2] = sample(data[,2])
data[,3] = sample(data[,3])
...

If you have many columns, you can extend this like:

data[,-1] = apply(data[,-1], 2, sample)

EDIT: With your clarification about row equivalence, this is just:

data[,-1] = data[sample(nrow(data)),-1]

Upvotes: 1

Max
Max

Reputation: 4932

> t <- data.frame(matrix(nrow=4,ncol=10,data=1:40))
> t
    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
    1  1  5  9 13 17 21 25 29 33  37
    2  2  6 10 14 18 22 26 30 34  38
    3  3  7 11 15 19 23 27 31 35  39
    4  4  8 12 16 20 24 28 32 36  40
> columns_to_random <- c(8,9,10)
> t[,columns_to_random] <- t[sample(1:nrow(t),size=nrow(t)), columns_to_random]
>   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
    1  1  5  9 13 17 21 25 32 36  40
    2  2  6 10 14 18 22 26 29 33  37
    3  3  7 11 15 19 23 27 30 34  38
    4  4  8 12 16 20 24 28 31 35  39

Upvotes: 3

Related Questions