Reputation: 195
so I have a table which contains data on a subject taking two versions of a test. What I would like to do is write some code that allows me to randomly select which version of the test to include in the final table and which to discard. Here is some example data:
ID test1 test2
38762 21 36
37874 17 20
37813 15 17
37738 23 31
37470 25 36
37308 31 32
37039 25 16
36045 16 9
I need this to be as close to random as possible, so any help would be greatly appreciated.
Thanks in advance
EDIT: Desired output:
row.names ID test1
67 38762 21
218 36045 16
row.names ID test2
108 37874 20
114 37813 17
117 37738 31
140 37470 36
152 37308 32
175 37039 16
Upvotes: 1
Views: 1775
Reputation: 3492
> df=NULL
> df$ID=sample(38700:38800,10,F)
> df$test1=sample(15:25,10,F)
> df$test2=sample(15:35,10,F)
> df=as.data.frame(df)
> df
ID test1 test2
1 38784 24 19
2 38747 15 15
3 38791 16 34
4 38721 25 32
5 38769 20 23
6 38706 21 26
7 38702 17 29
8 38761 22 28
9 38763 19 25
10 38740 23 16
> df$ran=sample(2,nrow(df),T)
> df$test=ifelse(df$ran==1,df$test1,df$test2)
> df
ID test1 test2 ran test
1 38784 24 19 1 24
2 38747 15 15 1 15
3 38791 16 34 1 16
4 38721 25 32 1 25
5 38769 20 23 1 20
6 38706 21 26 1 21
7 38702 17 29 2 29
8 38761 22 28 1 22
9 38763 19 25 1 19
10 38740 23 16 2 16
> df$testchosen=ifelse(df$ran==1,"test1","test2")
> df
ID test1 test2 ran test testchosen
1 38784 24 19 1 24 test1
2 38747 15 15 1 15 test1
3 38791 16 34 1 16 test1
4 38721 25 32 1 25 test1
5 38769 20 23 1 20 test1
6 38706 21 26 1 21 test1
7 38702 17 29 2 29 test2
8 38761 22 28 1 22 test1
9 38763 19 25 1 19 test1
10 38740 23 16 2 16 test2
>
Upvotes: 1
Reputation: 133
You could something like this: start out by making your three columns a data frame, if the aren't already. Then subset that data frame according to a random vector of 0s and 1s you generated.
df <- cbind(ID, test1, test2)
#make vector of 0s and 1s of the length = number of rows of df
ran <- sample(c(0,1), nrow(df), replace = TRUE)
group1 <- subset(subset(df, select = c(ID, test1)), subset = ran == 0)
group2 <- subset(subset(df, select = c(ID, test2)), subset = ran == 1)
Upvotes: 1