Reputation: 57
I've been really racking my brain over this problem and with no expedient solution in sight.
I have a dataset over which I am trying to permute one variable (an attribute) within another variable (a location), irrespective of an object (an item).
Here's a snippet of the data:
ID_FIELD SPCD Total
1177 833 428.286591
11383 691 1175.846712
24081 316 137.042979
11383 318 177.335481
1177 71 166.629921
24081 110 1170.012216
1177 12 8.379811
30284 541 585.039300
24081 746 188.808428
24081 531 196.142482
1177 111 47.258113
1177 12 198.443376
11383 827 16.095224
Using ddply()
function in the plyr
package, with R version 3.2.0, I've submitted the following code:
ddply(data,.(Total,ID_FIELD),sample)
Here, I am trying to permute Total
(the attribute) across SPCD
(the item) within ID_FIELD
(the location), and after running ddply()
code twice in sequence, the result is the exact same as before, which is not what I want. I'd like this process randomized at each running of the function (i.e. a new shuffling of Total
each submission of ddply()
).
Any clues as to how to accomplish this? A speedy process would be appreciated as well, given that the application is with a large dataset. I am at my wit's end.
Many thanks.
Upvotes: 0
Views: 71
Reputation: 57696
Using plyr:
ddply(data, .(ID_FIELD), function(df) df[sample(nrow(df)),])
Using dplyr, which has a sampling function supplied:
library(dplyr)
data %>% group_by(ID_FIELD) %>% sample_frac
Upvotes: 2