Generate identifier for subgroups based on one row in a subgroup

Question

I have a dataframe with data from multiple experiments with multiple conditions. In each of these, there are multiple periods and multiple subjects who interact in these periods.

My data looks as follows (first five columns):

Experiment Condition Period Subject   E    G   GNew
     1         1        1      1     20    1     1
     1         1        1      2     60    2     2
     1         1        1      3     20    1     1
     1         1        1      4     60    2     2
     1         1        2      1     23   NA     1
     1         1        2      2     45   NA     2
     1         1        2      3     13   NA     1
     1         1        2      4     20   NA     2
     1         2        1      1     50    3     3
     1         2        1      2     50    3     3
     1         2        1      3     40    4     4
     1         2        1      4     50    3     3
     1         2        2      1     23   NA     3
     1         2        2      2     45   NA     3
     1         2        2      3     13   NA     4
     1         2        2      4     20   NA     3

I now want to generate a variable GNew which groups subjects into groups depending on the value E in the first period within the same experiment and condition.

I have succeeded in generating the column G but what I would like is to end up with a variable like GNew, which assigns a group number to each subject based on their value in E in the first period, but contains this number in every period. Different experiments and conditions are independent of each other and should receive different group numbers GNew, as in the data shown above.

I can achieve this with nested for-loops, but I am sure there is a more elegant solution using aggregate, by, apply, data.table or some such. I have googled for this for a while now, but the solution yet eludes me.

JonMinton · Accepted Answer

If tidyverse not forbidden, you could do a group_by, then arrange, then a mutate selecting the first element within each group.

data %>%
    group_by(Experiment, Condition) %>% 
    arrange(E) %>%
    mutate(Gnew = E[1]) %>% 
    ungroup() -> data

(n.b. not tested)

Generate identifier for subgroups based on one row in a subgroup

Answers (1)

Related Questions