googleplex101
googleplex101

Reputation: 195

randomly selecting between two columns of data in a table in R

so I have a table which contains data on a subject taking two versions of a test. What I would like to do is write some code that allows me to randomly select which version of the test to include in the final table and which to discard. Here is some example data:

ID     test1    test2

38762   21       36
37874   17       20
37813   15       17
37738   23       31
37470   25       36
37308   31       32
37039   25       16
36045   16        9

I need this to be as close to random as possible, so any help would be greatly appreciated.

Thanks in advance

EDIT: Desired output:

row.names   ID  test1
    67  38762   21
    218 36045   16


row.names   ID  test2
    108 37874   20
    114 37813   17
    117 37738   31
    140 37470   36
    152 37308   32
    175 37039   16

Upvotes: 1

Views: 1775

Answers (2)

Ajay Ohri
Ajay Ohri

Reputation: 3492

> df=NULL
> df$ID=sample(38700:38800,10,F)
> df$test1=sample(15:25,10,F)
> df$test2=sample(15:35,10,F)
> df=as.data.frame(df)
> df
      ID test1 test2
1  38784    24    19
2  38747    15    15
3  38791    16    34
4  38721    25    32
5  38769    20    23
6  38706    21    26
7  38702    17    29
8  38761    22    28
9  38763    19    25
10 38740    23    16
> df$ran=sample(2,nrow(df),T)
> df$test=ifelse(df$ran==1,df$test1,df$test2)
> df
      ID test1 test2 ran test
1  38784    24    19   1   24
2  38747    15    15   1   15
3  38791    16    34   1   16
4  38721    25    32   1   25
5  38769    20    23   1   20
6  38706    21    26   1   21
7  38702    17    29   2   29
8  38761    22    28   1   22
9  38763    19    25   1   19
10 38740    23    16   2   16
> df$testchosen=ifelse(df$ran==1,"test1","test2")
> df
      ID test1 test2 ran test testchosen
1  38784    24    19   1   24      test1
2  38747    15    15   1   15      test1
3  38791    16    34   1   16      test1
4  38721    25    32   1   25      test1
5  38769    20    23   1   20      test1
6  38706    21    26   1   21      test1
7  38702    17    29   2   29      test2
8  38761    22    28   1   22      test1
9  38763    19    25   1   19      test1
10 38740    23    16   2   16      test2
> 

Upvotes: 1

Michael Kaiser
Michael Kaiser

Reputation: 133

You could something like this: start out by making your three columns a data frame, if the aren't already. Then subset that data frame according to a random vector of 0s and 1s you generated.

 df <- cbind(ID, test1, test2)
 #make vector of 0s and 1s of the length = number of rows of df 
 ran <- sample(c(0,1), nrow(df), replace = TRUE) 

 group1 <- subset(subset(df, select = c(ID, test1)), subset = ran == 0)
 group2 <- subset(subset(df, select = c(ID, test2)), subset = ran == 1)

Upvotes: 1

Related Questions