user3585829
user3585829

Reputation: 965

Creating an index for each subject in R

I'm working with some data on repeated measures of subjects over time. The data is in this format:

Subject <- as.factor(c(rep("A", 20), rep("B", 35), rep("C", 13)))
variable.A <- rnorm(mean = 300, sd = 50, n = Subject)
dat <- data.frame(Subject, variable.A)
dat

   Subject variable.A
1        A   334.6567
2        A   353.0988
3        A   244.0863
4        A   284.8918
5        A   302.6442
6        A   298.3162
7        A   271.4864
8        A   268.6848
9        A   262.3761
10       A   341.4224
11       A   190.4823
12       A   297.1981
13       A   319.8346
14       A   343.9855
15       A   332.5318
16       A   221.9502
17       A   412.9172
18       A   283.4206
19       A   310.9847
20       A   276.5423
21       B   181.5418
22       B   340.5812
23       B   348.5162
24       B   364.6962
25       B   312.2508
26       B   278.9855
27       B   242.8810
28       B   272.9585
29       B   239.2776
30       B   254.9140
31       B   253.8940
32       B   330.1918
33       B   300.7302
34       B   237.6511
35       B   314.4919
36       B   239.6195
37       B   282.7955
38       B   260.0943
39       B   396.5310
40       B   325.5422
41       B   374.8063
42       B   363.1897
43       B   258.0310
44       B   358.8605
45       B   251.8775
46       B   299.6995
47       B   303.4766
48       B   359.8955
49       B   299.7089
50       B   289.3128
51       B   401.7680
52       B   276.8078
53       B   441.4852
54       B   232.6222
55       B   305.1977
56       C   298.4580
57       C   210.5164
58       C   272.0228
59       C   282.0540
60       C   207.8797
61       C   263.3859
62       C   324.4417
63       C   273.5904
64       C   348.4389
65       C   174.2979
66       C   363.4353
67       C   260.8548
68       C   306.1833

I've used the seq_along() function and the dplyr package to create an index of each observation for every subject:

dat <- as.data.frame(dat %>%
            group_by(Subject) %>%
            mutate(index = seq_along(Subject)))

   Subject variable.A index
1        A   334.6567     1
2        A   353.0988     2
3        A   244.0863     3
4        A   284.8918     4
5        A   302.6442     5
6        A   298.3162     6
7        A   271.4864     7
8        A   268.6848     8
9        A   262.3761     9
10       A   341.4224    10
11       A   190.4823    11
12       A   297.1981    12
13       A   319.8346    13
14       A   343.9855    14
15       A   332.5318    15
16       A   221.9502    16
17       A   412.9172    17
18       A   283.4206    18
19       A   310.9847    19
20       A   276.5423    20
21       B   181.5418     1
22       B   340.5812     2
23       B   348.5162     3
24       B   364.6962     4
25       B   312.2508     5
26       B   278.9855     6
27       B   242.8810     7
28       B   272.9585     8
29       B   239.2776     9
30       B   254.9140    10
31       B   253.8940    11
32       B   330.1918    12
33       B   300.7302    13
34       B   237.6511    14
35       B   314.4919    15
36       B   239.6195    16
37       B   282.7955    17
38       B   260.0943    18
39       B   396.5310    19
40       B   325.5422    20
41       B   374.8063    21
42       B   363.1897    22
43       B   258.0310    23
44       B   358.8605    24
45       B   251.8775    25
46       B   299.6995    26
47       B   303.4766    27
48       B   359.8955    28
49       B   299.7089    29
50       B   289.3128    30
51       B   401.7680    31
52       B   276.8078    32
53       B   441.4852    33
54       B   232.6222    34
55       B   305.1977    35
56       C   298.4580     1
57       C   210.5164     2
58       C   272.0228     3
59       C   282.0540     4
60       C   207.8797     5
61       C   263.3859     6
62       C   324.4417     7
63       C   273.5904     8
64       C   348.4389     9
65       C   174.2979    10
66       C   363.4353    11
67       C   260.8548    12
68       C   306.1833    13

What I'm now looking to do is set up an analysis that looks at every 10 observations, so I'd like to create another column that basically gives me a number for every 10 observations. For example, Subject A would have a sequence of ten "1's" followed by a sequence of ten "2's" (IE, two groupings of 10). I've tried to use the rep() function but the issue I'm running into is that the other subjects don't have a number of observations that is divisible by 10.

Is there a way for the rep() function to just assign the grouping the next number, even if it doesn't have 10 total observations? For example, Subject B would have ten "1's", ten "2's" and then five "3's" (representing that his last group of observations)?

Upvotes: 0

Views: 319

Answers (2)

Woodstock
Woodstock

Reputation: 389

For a plain vanilla base R solution, you also could try this:

dat$newcol <- 1
dat$index <- ave(dat$newcol, dat$Subject, FUN = cumsum)
dat$chunk_id <- (dat$index - 1) %/% 10 + 1

which, when you run the table command as above gives you

table(dat$Subject, dat$chunk_id)
     1  2  3  4
  A 10 10  0  0
  B 10 10 10  5
  C 10  3  0  0

If you don't want the extra 'newcol' column, just use 'NULL' to get rid of it:

dat$newcol <- NULL

Upvotes: 1

akuiper
akuiper

Reputation: 214927

You can use modular division %/% to generate the ids:

dat %>% 
    group_by(Subject) %>% 
    mutate(chunk_id = (seq_along(Subject) - 1) %/% 10 + 1) -> dat1

table(dat1$Subject, dat1$chunk_id)

#     1  2  3  4
#  A 10 10  0  0
#  B 10 10 10  5
#  C 10  3  0  0

Upvotes: 3

Related Questions