Reputation: 173
I have a dataframe with data from multiple experiments with multiple conditions. In each of these, there are multiple periods and multiple subjects who interact in these periods.
My data looks as follows (first five columns):
Experiment Condition Period Subject E G GNew
1 1 1 1 20 1 1
1 1 1 2 60 2 2
1 1 1 3 20 1 1
1 1 1 4 60 2 2
1 1 2 1 23 NA 1
1 1 2 2 45 NA 2
1 1 2 3 13 NA 1
1 1 2 4 20 NA 2
1 2 1 1 50 3 3
1 2 1 2 50 3 3
1 2 1 3 40 4 4
1 2 1 4 50 3 3
1 2 2 1 23 NA 3
1 2 2 2 45 NA 3
1 2 2 3 13 NA 4
1 2 2 4 20 NA 3
I now want to generate a variable GNew which groups subjects into groups depending on the value E in the first period within the same experiment and condition.
I have succeeded in generating the column G but what I would like is to end up with a variable like GNew, which assigns a group number to each subject based on their value in E in the first period, but contains this number in every period. Different experiments and conditions are independent of each other and should receive different group numbers GNew, as in the data shown above.
I can achieve this with nested for-loops, but I am sure there is a more elegant solution using aggregate
, by
, apply
, data.table
or some such. I have googled for this for a while now, but the solution yet eludes me.
Upvotes: 2
Views: 348
Reputation: 1279
If tidyverse not forbidden, you could do a group_by
, then arrange
, then a mutate
selecting the first element within each group.
data %>%
group_by(Experiment, Condition) %>%
arrange(E) %>%
mutate(Gnew = E[1]) %>%
ungroup() -> data
(n.b. not tested)
Upvotes: 2