Vergil
Vergil

Reputation: 59

R: How to randomly sample one value from each column and bootstrap

I have 18 columns and 100 rows, where columns stand for 18 students and rows stand for their grades in 100 exams. Here is what I want: for each student, I want to randomly sample/select only one grade from all 100 grades. In other words, I want a sample with 18 columns and just 1 row. I have tried apply, sample functions, but all of these just don't work, and I don't know why.

bs = data.frame(matrix(nrow=1,ncol=18))
for (i in colnames(high)){
  bs[,i]=sample(high[,i],1,replace=TRUE)
}

as.data.frame(lapply(high[,i],sample,18,replace=TRUE))

Upvotes: 2

Views: 1371

Answers (4)

Nick Pinkham
Nick Pinkham

Reputation: 1

You can rearrange your dataframe:

df <- df[sample(1:nrow(df)),]

then you take the first observation of each group in your dataframe:

df.pick <- df[!duplicated(df$group) , ]

Upvotes: 0

RD_
RD_

Reputation: 324

You can use the sample() to randomly select a column.

I have created a small sample of the data here. It will be helpful if you provide the sample data for the best comprehension of the problem.

# sample data
df <- data.frame(
  student1 = c(50, 45, 86, 30),
  student2 = c(56, 78, 63, 58),
  student3 = c(88, 60, 75, 93),
  student4 = c(87, 33, 49, 11),
  student5 = c(85, 96, 55, 64)
)

Then you loop through each exam record and randomly chose a student's grade and store it in a vector. As a final step, since you want a data frame, you can convert the vector to a data frame.

# column names
students <- colnames(df)

# empty vector
vals <- c()

for(s in students) {
  grade <- sample(df[[s]], 1)
  vals <- c(vals, grade)
}

finalDF <- as.data.frame(t(vals))
names(finalDF) <- students
finalDF

The output for 2 iterations I ran are -

  student1 student2 student3 student4 student5
1       45       78       93       87       64

  student1 student2 student3 student4 student5
1       45       63       93       87       96

The other answers are really smart, but nonetheless, I hope this helps!

Upvotes: 1

Darren Tsai
Darren Tsai

Reputation: 35584

Try this

apply(data, 2, sample, size = 1)

Use @StupidWolf's data for test:

set.seed(101)
apply(high, 2, sample, size = 1)

#   student1   student2   student3   student4   student5   student6   student7   student8   student9  student10  student11  student12  student13  student14  student15  student16  student17  student18
# 0.57256477 0.84338121 0.71225050 0.56432392 0.23865929 0.23563641 0.51903694 0.36692427 0.51577410 0.45780908 0.19434773 0.70247028 0.60383059 0.25451088 0.78583242 0.86241707 0.05360842 0.61892604

Upvotes: 1

StupidWolf
StupidWolf

Reputation: 46908

Lets say your data is like this:

set.seed(100)
high = matrix(runif(100*18),ncol=18)
colnames(high) = paste0("student",1:18)
rownames(high) = paste0("exam",1:100)

head(high)
        student1   student2  student3  student4  student5  student6   student7
exam1 0.30776611 0.32741508 0.3695961 0.8495923 0.5112374 0.2202326 0.03176634
exam2 0.25767250 0.38947869 0.9563228 0.6532260 0.2777107 0.7431595 0.57970549
exam3 0.55232243 0.04105275 0.9135767 0.9508858 0.3606569 0.3059573 0.15420484
exam4 0.05638315 0.36139663 0.8233363 0.6172230 0.4375279 0.4022088 0.12527050

What you want to do, is sample 1 to 100, 18 times with replacement (to be similar to bootstrap, thanks to @H1 for pointing this out):

set.seed(101)
take=sample(1:100,18,replace=TRUE)
take
 [1] 73 57 46 95 81 58 95 61 60 59 99  3 32  9 96 99 99 98

As you can see from above, 99 is taken quite a few times with replace=TRUE. We will take the 73 entry of column1, 56 entry of column2 and so on. This can be done with:

high[cbind(take,1:18)]
 [1] 0.57256477 0.84338121 0.71225050 0.56432392 0.23865929 0.23563641
 [7] 0.51903694 0.36692427 0.51577410 0.45780908 0.19434773 0.70247028
[13] 0.60383059 0.25451088 0.78583242 0.86241707 0.05360842 0.61892604

Upvotes: 1

Related Questions