AD_R
AD_R

Reputation: 57

Random sampling without replacement of one variable within another variable: Using ddply() functin in {plyr} package - R

I've been really racking my brain over this problem and with no expedient solution in sight.

I have a dataset over which I am trying to permute one variable (an attribute) within another variable (a location), irrespective of an object (an item).

Here's a snippet of the data:

         ID_FIELD   SPCD       Total
              1177   833  428.286591
             11383   691 1175.846712
             24081   316  137.042979
             11383   318  177.335481
              1177    71  166.629921
             24081   110 1170.012216
              1177    12    8.379811
             30284   541  585.039300
             24081   746  188.808428
             24081   531  196.142482
              1177   111   47.258113
              1177    12  198.443376
             11383   827   16.095224

Using ddply() function in the plyr package, with R version 3.2.0, I've submitted the following code:

ddply(data,.(Total,ID_FIELD),sample)

Here, I am trying to permute Total (the attribute) across SPCD (the item) within ID_FIELD (the location), and after running ddply() code twice in sequence, the result is the exact same as before, which is not what I want. I'd like this process randomized at each running of the function (i.e. a new shuffling of Total each submission of ddply()).

Any clues as to how to accomplish this? A speedy process would be appreciated as well, given that the application is with a large dataset. I am at my wit's end.

Many thanks.

Upvotes: 0

Views: 71

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57696

Using plyr:

ddply(data, .(ID_FIELD), function(df) df[sample(nrow(df)),])

Using dplyr, which has a sampling function supplied:

library(dplyr)
data %>% group_by(ID_FIELD) %>% sample_frac

Upvotes: 2

Related Questions