Reputation: 357
I have a dataframe that looks like this:
X Y
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4
3 1
3 2
3 3
3 4
4 1
4 2
4 3
4 4
Now I would like to obtain n samples of m pairs (x,y), so that there is not repetition of any value in any of the combinations and in any of the element orders.
For example, for m=2: sample [(1,3),(4,3)]
is not valid solution (3 repeated in y), sample [(1,3),(4,1)]
is not valid solution either (1 repeated in first x and second y), but samples [(1,2),(3,4)]
or [(1,1),(2,2)]
are examples of valid solutions.
I have been trying this, but I do not know how to find and remove duplicates of x in y.
y <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
x <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
df <- data.frame(x, y)
subset(df[sample(nrow(df)),], !duplicated(x) & !duplicated(y))
Upvotes: 1
Views: 55
Reputation: 102700
Maybe you can try the code like below
m <- 2
n <- 5
res <- replicate(n,
Map(c,
x <- sample(unique(df$X),m),
y <- list(sample(setdiff(df$Y,x),m),x)[[sample(2,1)]]),
simplify = FALSE)
DATA
df <- rev(expand.grid(Y=1:4,X=1:4))
Upvotes: 1
Reputation: 1464
You could probably start with something like this
res <- cbind(df[sample(nrow(df)),], df[sample(nrow(df)),])
and then this
res[,c("x1NotOk", "y1NotOk") ] <- t(apply(res, 1, function(x) x[1:2] %in% x[3:4]))
which will give you something like this
> res
x y x.1 y.1 x1NotOk y1NotOk
4 1 4 2 3 FALSE FALSE
10 3 2 1 2 FALSE TRUE
5 2 1 4 3 FALSE FALSE
2 1 2 2 1 TRUE TRUE
16 4 4 1 1 FALSE FALSE
....
After that you drop the ones where either x1NotOk or y1NotOk (are TRUE) eg
-which(res$x1NotOk | res$y1NotOk)
.
Upvotes: 1
Reputation: 174506
Here's a function that generates a list of n samples of m elements taken without repeats from vectors x and y:
unique_sets <- function(x, y, m, n)
{
lapply(seq(n), function(z)
{
xs <- sample(x, m)
ys <- sample(unique(y[!(y %in% xs)]), m)
mapply(c, xs, ys, SIMPLIFY = FALSE)
})
}
So now you can do
y <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
x <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
set.seed(69)
unique_sets(x, y, m = 2, n = 3)
#> [[1]]
#> [[1]][[1]]
#> [1] 4 2
#>
#> [[1]][[2]]
#> [1] 1 3
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 4 1
#>
#> [[2]][[2]]
#> [1] 2 3
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 4 3
#>
#> [[3]][[2]]
#> [1] 2 1
Created on 2020-04-16 by the reprex package (v0.3.0)
Upvotes: 1