user8840683
user8840683

Reputation:

create a variable separately but uniquely for each `id` in a data.frame in R

I have a data.frame with two variables es and id. I want to add a new variable called weeks. But I want to create weeks separately but uniquely for each id.

For example, if es == "SHORT" for all rows whose id == 1, I want the same number (e.g., 3). And if id == 2 a different number (e.g., 1).

Can I achieve this in BASE R (see desired output structure below)?

NOTE: SHORT < DEL1 < DEL2 in terms of numerical value under each id.

Below is the data and code I tried without success:

D <- data.frame(es = c("SHORT", "SHORT", "SHORT","DEL1", "DEL1","DEL1","SHORT", 
            "SHORT", "SHORT", "DEL1", "DEL1", "DEL1","DEL2","DEL2","DEL2"),

                id = c(rep(1, 6), rep(2, 9)) ) 


weeks <- ifelse(D$es == "SHORT", sample(1:5, 6, T), ifelse(D$es == "DEL1", 
                                                        sample(4:8, 7, T),
                                                        sample(7:12, 2, T)))

Desired output structure (numeric values are random):

   es  id   weeks
SHORT  1     3
SHORT  1     3
SHORT  1     3
 DEL1  1     5
 DEL1  1     5
 DEL1  1     5
SHORT  2     1
SHORT  2     1
SHORT  2     1
 DEL1  2     6
 DEL1  2     6
 DEL1  2     6
 DEL2  2     8
 DEL2  2     8
 DEL2  2     8

Upvotes: 1

Views: 87

Answers (2)

Parfait
Parfait

Reputation: 107587

Consider diff and cumsum for sequential ordering of distinct groupings:

set.seed(8132019)
rand <- sample(1:10, 10, replace=FALSE)

D <- within(D, {        
          diff <- c(0,diff(es)) + c(0, diff(id))
          weeks <- cumsum(ifelse(diff == 0, 0, 1)) + 1

          rm(diff)
    })

D
#       es id weeks
# 1  SHORT  1     1
# 2  SHORT  1     1
# 3  SHORT  1     1
# 4   DEL1  1     2
# 5   DEL1  1     2
# 6   DEL1  1     2
# 7  SHORT  2     3
# 8  SHORT  2     3
# 9  SHORT  2     3
# 10  DEL1  2     4
# 11  DEL1  2     4
# 12  DEL1  2     4
# 13  DEL2  2     5
# 14  DEL2  2     5
# 15  DEL2  2     5

Upvotes: 1

Marcus
Marcus

Reputation: 3636

Essentially what @markus suggested. Can replace the seq_along with sample or other function if you need the weeks to be random.

D <- data.frame(es = c("SHORT", "SHORT", "SHORT","DEL1", "DEL1","DEL1","SHORT", 
                       "SHORT", "SHORT", "DEL1", "DEL1", "DEL1","DEL2","DEL2","DEL2"),

                id = c(rep(1, 6), rep(2, 9)) ) 

weeksTbl <- unique(D)
weeksTbl$weeks <- seq_along(weeksTbl[[1]])

merge(D, weeksTbl, all = TRUE, sort = FALSE)

#>       es id weeks
#> 1  SHORT  1     1
#> 2  SHORT  1     1
#> 3  SHORT  1     1
#> 4   DEL1  1     2
#> 5   DEL1  1     2
#> 6   DEL1  1     2
#> 7  SHORT  2     3
#> 8  SHORT  2     3
#> 9  SHORT  2     3
#> 10  DEL1  2     4
#> 11  DEL1  2     4
#> 12  DEL1  2     4
#> 13  DEL2  2     5
#> 14  DEL2  2     5
#> 15  DEL2  2     5

Upvotes: 1

Related Questions