Reputation: 4914
I'm currently writing a custom function to achieve this, but I was wondering if there was a simple, built-in function in R that would achieve the same goals.
I have data like:
stringVariable1 stringVariable2
string1 a
string1 b
string1 d
string2 e
string2 a
string3 b
And I want to shuffle the data in stringVariable2, but I don't want duplicates in respect to the different stringVariables in 1.
So this wouldn't be acceptable (as 'b' is duplicated with respect to string1):
stringVariable1 stringVariable2
string1 b
string1 b
string1 d
string2 a
string2 e
string3 d
But this would:
stringVariable1 stringVariable2
string1 b
string1 e
string1 d
string2 a
string2 e
string3 d
So essentially I'm trying to randomise the stringVariable2, without replacement with respect to the different stringVariable1's. Is creating a custom function the only way to do this?
Thanks!
Upvotes: 2
Views: 458
Reputation:
Are the values of stringVariable2 duplicated in the groups of stringVariable1? If not, a group-wise permutation could be performed with something like (d is the name of the data frame containing the data):
d$perm1<-as.vector(unlist(tapply(d$stringVariable2, d$stringVariable1, sample)))
This (tapply()
) applies sampling without replacement (using sample()
) for stringVariable2 inside every group of stringVariable1. Finally, the resulting list is converted to a vector using unlist()
and as.vector()
. The last function just strips off the names of observations inside the vector. The permuted values are then stored ìn the column perm1 of the original data frame.
Upvotes: 2