Reputation: 519
I have a very large data frame that contains 100 rows and 400000 columns.
To sample each column, I can simply do:
df <- apply(df, 2, sample)
But I want every two column to be sampled together. For example, if originally col1 is c(1,2,3,4,5)
and col2 is also c(6,7,8,9,10)
, and after resampling, col1 becomes c(1,3,2,4,5)
, I want col2 to be c(6,8,7,9,10)
that follows the resampling pattern of col1. Same thing for col3 & col4, col5 & col6, etc.
I wrote a for loop to do this, which takes forever. Is there a better way? Thanks!
Upvotes: 0
Views: 1014
Reputation: 214927
You might try this; split the data frame every two columns with split.default
, for each sub data frame, sample the rows and then bind them together:
df <- data.frame(col1 = 1:5, col2 = 6:10, col3 = 11:15)
index <- seq_len(nrow(df))
cbind.data.frame(
setNames(lapply(
split.default(df, (seq_along(df) - 1) %/% 2),
function(sdf) sdf[sample(index),,drop=F]),
NULL)
)
# col1 col2 col3
#5 5 10 12
#4 4 9 11
#1 1 6 15
#2 2 7 14
#3 3 8 13
Upvotes: 1