colonus
colonus

Reputation: 329

dplyr: comparing values within a variable dependent on another variable

How can I compare values within a variable dependent on another variable with dplyr?

The df is based on choice data (long format) from a survey. It has one variable that indicates a participants id, another that indicates the choice instance and one that indicates which alternative was chosen. In my data I have the feeling that a lot of people tend to get bored of the task and therefore stick to one alternative for every instance. I would therefore like to identify people who always selected the same option from a certain instance onwards till the end.

Here is an example df:

set.seed(0)
df <- tibble(
    id = rep(1:5,each=12),
    inst = rep(1:12,5),
    alt = sample(1:3, size =60, replace=T),
)

That looks like the following:

   id     inst alt
1   1        1   3
2   1        2   1
3   1        3   2
4   1        4   2
5   1        5   3
6   1        6   1
7   1        7   3
8   1        8   3
9   1        9   2
10  1       10   2
11  1       11   1   <-
12  1       12   1   <-
13  2        1   1
14  2        2   3
...

I would like to create two new variables count and count_alt. The new variable count should indicate how often the same value appeared in alt based on id and inst, only counting values from the end of id. So for participant (id==1) the count variable should be 2, since alternative 1 was chosen in the last two instances (11 & 12). The count_alt would take the value 1 (always the same as inst == 12)

The new df schould look like the following

 id     inst   alt   count   count_alt
1   1        1   3       2           1
2   1        2   1       2           1
3   1        3   2       2           1
4   1        4   2       2           1
5   1        5   3       2           1
6   1        6   1       2           1
7   1        7   3       2           1
8   1        8   3       2           1
9   1        9   2       2           1
10  1       10   2       2           1
11  1       11   1       2           1
12  1       12   1       2           1     
...

I would prefer to solve this with dplyr and not with a loop since I want to incooperate it into further data wrangling steps.

Upvotes: 0

Views: 215

Answers (1)

arg0naut91
arg0naut91

Reputation: 14764

See if that solves it:

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(
    count = cumsum(alt != lag(alt, default = "rndm")),
    count = sum(count == max(count)),
    count_alt = alt[n()]
  )

Output:

   id inst alt count count_alt
1   1    1   3     2         1
2   1    2   1     2         1
3   1    3   2     2         1
4   1    4   2     2         1
5   1    5   3     2         1
6   1    6   1     2         1
7   1    7   3     2         1
8   1    8   3     2         1
9   1    9   2     2         1
10  1   10   2     2         1
11  1   11   1     2         1
12  1   12   1     2         1
13  2    1   1     1         2
14  2    2   3     1         2
15  2    3   2     1         2
16  2    4   3     1         2
17  2    5   2     1         2
18  2    6   3     1         2
19  2    7   3     1         2
20  2    8   2     1         2
21  2    9   3     1         2
22  2   10   3     1         2
23  2   11   1     1         2
24  2   12   2     1         2
25  3    1   1     1         3
26  3    2   1     1         3
27  3    3   2     1         3
28  3    4   1     1         3
29  3    5   2     1         3
30  3    6   3     1         3
31  3    7   2     1         3
32  3    8   2     1         3
33  3    9   2     1         3
34  3   10   2     1         3
35  3   11   1     1         3
36  3   12   3     1         3
37  4    1   3     1         1
38  4    2   3     1         1
39  4    3   1     1         1
40  4    4   3     1         1
41  4    5   2     1         1
42  4    6   3     1         1
43  4    7   2     1         1
44  4    8   3     1         1
45  4    9   2     1         1
46  4   10   2     1         1
47  4   11   3     1         1
48  4   12   1     1         1
49  5    1   2     2         2
50  5    2   3     2         2
51  5    3   3     2         2
52  5    4   2     2         2
53  5    5   3     2         2
54  5    6   2     2         2
55  5    7   1     2         2
56  5    8   1     2         2
57  5    9   1     2         2
58  5   10   1     2         2
59  5   11   2     2         2
60  5   12   2     2         2

Upvotes: 1

Related Questions