Rambatino
Rambatino

Reputation: 4914

sample column without duplicates

I'm currently writing a custom function to achieve this, but I was wondering if there was a simple, built-in function in R that would achieve the same goals.

I have data like:

stringVariable1     stringVariable2

string1             a
string1             b
string1             d
string2             e
string2             a
string3             b

And I want to shuffle the data in stringVariable2, but I don't want duplicates in respect to the different stringVariables in 1.

So this wouldn't be acceptable (as 'b' is duplicated with respect to string1):

stringVariable1     stringVariable2

string1             b
string1             b
string1             d
string2             a
string2             e
string3             d

But this would:

stringVariable1     stringVariable2

string1             b
string1             e
string1             d
string2             a
string2             e
string3             d

So essentially I'm trying to randomise the stringVariable2, without replacement with respect to the different stringVariable1's. Is creating a custom function the only way to do this?

Thanks!

Upvotes: 2

Views: 458

Answers (1)

user2357031
user2357031

Reputation:

Are the values of stringVariable2 duplicated in the groups of stringVariable1? If not, a group-wise permutation could be performed with something like (d is the name of the data frame containing the data):

d$perm1<-as.vector(unlist(tapply(d$stringVariable2, d$stringVariable1, sample)))

This (tapply()) applies sampling without replacement (using sample()) for stringVariable2 inside every group of stringVariable1. Finally, the resulting list is converted to a vector using unlist() and as.vector(). The last function just strips off the names of observations inside the vector. The permuted values are then stored ìn the column perm1 of the original data frame.

Upvotes: 2

Related Questions