For each group assign different values from vector

Question

I am trying to generate a fake dataset for testing.

It was easy enough to generate the columns that exist in all combinations:

subject <- 1:5
visit <- c("D0", "D100", "D500")
isotype <- c("IgG", "IgA", "IgM", "IgD)

testdata <- expand.grid(subject, visit, isotype)

names(testdata) <- c("subject", "visit", "isotype")

Now I need to create two more columns; "positivity" with a particular value for each group in "visit", and "response" with an random integer with a range dependent on each group in "visit".

For "positivity", I could do it this way:

testdata[testdata$visit == "D0", c("positivity")] <- NA
testdata[testdata$visit == "D100", c("positivity")] <- 1
testdata[testdata$visit == "D500", c("positivity")] <- 0

and for "response", I could do it this way:

testdata[testdata$visit == "D0", c("response")] <- sample(1:100, 1)
testdata[testdata$visit == "D100", c("response")] <- sample(20000:30000, 1)
testdata[testdata$visit == "D500", c("response")] <- sample(1:100, 1)

but in reality I have many more unique observations in "visit" than this and that would take forever. I was hoping I could use dplyr and group_by to loop through each group and assign "positivity" from a vector since the length of that vector should be equal to the number of groups in "visit" and assign "response" with a vector of ranges for the sample method.

positivityvalues <- c(NA, 1, 0)
responseranges <- c(1:100, 1:500, 1:100)


testdata <- testdata %>%
            group_by(visit) %>%
            mutate(#i can't figure out what to put here
            #positivity[1] = positivityvalues[1] etc...
            #response[1] = sample(responseranges[1], 1) etc...
            )

to get something like this (for the sake of clarity, only the first two subjects and isotypes are listed)

subject    visit    isotype    positivity    response
  1         D0       IgG          NA           58
  1         D100     IgG          1            27093
  1         D500     IgG          0            2   
  1         D0       IgA          NA           42
  1         D100     IgA          1            28921
  1         D500     IgA          0            85      
  2         D0       IgG          NA           86
  2         D100     IgG          1            26039
  2         D500     IgG          0            54   
  2         D0       IgA          NA           99
  2         D100     IgA          1            29021
  2         D500     IgA          0            23

Thanks

Edit* finished updates

Edit2* Solution:

ranges <- list(D0=c(1:100), D100=c(25000:32000), D500=c(1:100))
positives <- c(D0=NA, D100=1, D500=0)

testdata$positivity <- positives[testdata$visit]
testdata$responsetemp <- ranges[testdata$visit] 
testdata$reponse <- lapply(testdata$responsetemp, function(x) sample(x, 1))

Andrew Gustar · Accepted Answer

You can do this with a named vector...

testdata <- expand.grid(subject=subject, visit=visit, isotype=isotype) 
                                   #this way to get column names

positivityvalues <- c(D0=NA, D100=1, D500=0) #add names

testdata$positivity <- positivityvalues[testdata$visit] #adds value by name

You could do something similar with the parameters for the sample function in the response column.

For each group assign different values from vector

Answers (2)

Related Questions