r3vdev
r3vdev

Reputation: 325

For each group assign different values from vector

I am trying to generate a fake dataset for testing.

It was easy enough to generate the columns that exist in all combinations:

subject <- 1:5
visit <- c("D0", "D100", "D500")
isotype <- c("IgG", "IgA", "IgM", "IgD)

testdata <- expand.grid(subject, visit, isotype)

names(testdata) <- c("subject", "visit", "isotype")

Now I need to create two more columns; "positivity" with a particular value for each group in "visit", and "response" with an random integer with a range dependent on each group in "visit".

For "positivity", I could do it this way:

testdata[testdata$visit == "D0", c("positivity")] <- NA
testdata[testdata$visit == "D100", c("positivity")] <- 1
testdata[testdata$visit == "D500", c("positivity")] <- 0

and for "response", I could do it this way:

testdata[testdata$visit == "D0", c("response")] <- sample(1:100, 1)
testdata[testdata$visit == "D100", c("response")] <- sample(20000:30000, 1)
testdata[testdata$visit == "D500", c("response")] <- sample(1:100, 1)

but in reality I have many more unique observations in "visit" than this and that would take forever. I was hoping I could use dplyr and group_by to loop through each group and assign "positivity" from a vector since the length of that vector should be equal to the number of groups in "visit" and assign "response" with a vector of ranges for the sample method.

positivityvalues <- c(NA, 1, 0)
responseranges <- c(1:100, 1:500, 1:100)


testdata <- testdata %>%
            group_by(visit) %>%
            mutate(#i can't figure out what to put here
            #positivity[1] = positivityvalues[1] etc...
            #response[1] = sample(responseranges[1], 1) etc...
            )

to get something like this (for the sake of clarity, only the first two subjects and isotypes are listed)

subject    visit    isotype    positivity    response
  1         D0       IgG          NA           58
  1         D100     IgG          1            27093
  1         D500     IgG          0            2   
  1         D0       IgA          NA           42
  1         D100     IgA          1            28921
  1         D500     IgA          0            85      
  2         D0       IgG          NA           86
  2         D100     IgG          1            26039
  2         D500     IgG          0            54   
  2         D0       IgA          NA           99
  2         D100     IgA          1            29021
  2         D500     IgA          0            23  

Thanks

Edit* finished updates

Edit2* Solution:

ranges <- list(D0=c(1:100), D100=c(25000:32000), D500=c(1:100))
positives <- c(D0=NA, D100=1, D500=0)

testdata$positivity <- positives[testdata$visit]
testdata$responsetemp <- ranges[testdata$visit] 
testdata$reponse <- lapply(testdata$responsetemp, function(x) sample(x, 1))

Upvotes: 0

Views: 450

Answers (2)

akrun
akrun

Reputation: 887068

Here is an option using tidyverse. Create a named vector with the unique values of 'visit' (it is not clear how the values will be changed when there are more unique elements in 'visit'. Use that to match the visit elements and replace that with NA, 0, 1 of the matched vector, then split the data by 'visit', use map2 to sample from the range of corresponding vector

library(tidyverse)
v1 <- setNames(c(NA, 1, 0), as.character(unique(testdata$visit)))
testdata %>% 
     mutate(positivity = v1[visit]) %>% 
     split(.$visit) %>%
     map2_df(., list(1:100, 20000:30000, 1:100), ~ 
           .x %>% 
           mutate(response = sample(.y, n())))

Upvotes: 1

Andrew Gustar
Andrew Gustar

Reputation: 18425

You can do this with a named vector...

testdata <- expand.grid(subject=subject, visit=visit, isotype=isotype) 
                                   #this way to get column names

positivityvalues <- c(D0=NA, D100=1, D500=0) #add names

testdata$positivity <- positivityvalues[testdata$visit] #adds value by name

You could do something similar with the parameters for the sample function in the response column.

Upvotes: 1

Related Questions