Reputation: 75
I have a problem tidying a large dataset that would highly appreciate your help with. A subset of my data would look like this:
> d1
# A tibble: 16 x 8
Subject RT1 RT2 item ROI `Item _Number` IA_LABEL block
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
1 1 567 261 brilliant salesman 1 1 brilliant 3
2 1 494 76 brilliant salesman 2 1 salesman 3
3 1 441 211 detailed brochure 1 2 detailed 3
4 1 544 282 detailed brochure 2 2 brochure 3
5 1 289 153 detailed brochure 1 2 detailed 4
6 1 141 141 detailed brochure 2 2 brochure 4
7 1 141 141 brilliant salesman 1 1 brilliant 4
8 1 263 90 brilliant salesman 2 1 salesman 4
9 2 216 216 detailed brochure 1 2 detailed 3
10 2 248 248 detailed brochure 2 2 brochure 3
11 2 216 128 brilliant salesman 1 1 brilliant 3
12 2 238 140 brilliant salesman 2 1 salesman 3
13 2 212 212 detailed brochure 1 2 detailed 6
14 2 369 241 detailed brochure 2 2 brochure 6
15 2 208 208 brilliant salesman 1 1 brilliant 6
16 2 191 191 brilliant salesman 2 1 salesman 6
dput (d1)
structure(list(Subject = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2, 2), RT1 = c(567, 494, 441, 544, 289, 141, 141, 263,
216, 248, 216, 238, 212, 369, 208, 191), RT2 = c(261, 76, 211,
282, 153, 141, 141, 90, 216, 248, 128, 140, 212, 241, 208, 191
), item = c("brilliant salesman", "brilliant salesman", "detailed brochure",
"detailed brochure", "detailed brochure", "detailed brochure",
"brilliant salesman", "brilliant salesman", "detailed brochure",
"detailed brochure", "brilliant salesman", "brilliant salesman",
"detailed brochure", "detailed brochure", "brilliant salesman",
"brilliant salesman"), ROI = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2), `Item _Number` = c(1, 1, 2, 2, 2, 2, 1, 1,
2, 2, 1, 1, 2, 2, 1, 1), IA_LABEL = c("brilliant", "salesman",
"detailed", "brochure", "detailed", "brochure", "brilliant",
"salesman", "detailed", "brochure", "brilliant", "salesman",
"detailed", "brochure", "brilliant", "salesman"), block = c(3,
3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 6, 6, 6, 6)), row.names = c(NA,
-16L), class = c("tbl_df", "tbl", "data.frame"))
This data represents the reaction times (RT1 and RT2) once reading some words (IA_Label). In each row of the column (IA_LABEL), is the reaction times (RT1 AND RT2) for only ONE word. So, these words in IA_LABEL represents the first word of a phrase (where ROI=1) and the second word of the same phrase (where ROI=2). The way I programmed this experiment allowed me to only look at each word separately. I am trying also to look at RT1 and RT2 for the whole phrase as well (RT1 for the whole phrase is RT1 where ROI is 1 + RT1 where ROI is 2. The same process can be applied for the RT2 by summing its values where ROI =1 + where ROI=2.
I am trying to create two columns (variables) by summing the values of the dependent variables (RT1 and RT2).
So, ideally I am hoping to have a code that would mutate two variables in a similar way to the one below, where the first row of Sum_RT1
= RT1
of that row (where ROI
=1) + RT2
of the subsequent row where ROI
=2. The second row of Sum_RT1
would be the same value since the same summing process would be applied, but in the reverse direction (i.e., RT1
where ROI
is 2 + RT1
where ROI
is 2.
> d2
# A tibble: 16 x 10
Subject RT1 RT2 item ROI `Item _Number` IA_LABEL block Sum_RT1 Sum_RT2
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 567 261 brill~ 1 1 brillia~ 3 1061 337
2 1 494 76 brill~ 2 1 salesman 3 1061 337
3 1 441 211 detai~ 1 2 detailed 3 985 493
4 1 544 282 detai~ 2 2 brochure 3 985 493
5 1 289 153 detai~ 1 2 detailed 4 430 294
6 1 141 141 detai~ 2 2 brochure 4 430 294
7 1 141 141 brill~ 1 1 brillia~ 4 404 231
8 1 263 90 brill~ 2 1 salesman 4 404 231
9 2 216 216 detai~ 1 2 detailed 3 464 464
10 2 248 248 detai~ 2 2 brochure 3 464 464
11 2 216 128 brill~ 1 1 brillia~ 3 454 268
12 2 238 140 brill~ 2 1 salesman 3 454 268
13 2 212 212 detai~ 1 2 detailed 6 581 453
14 2 369 241 detai~ 2 2 brochure 6 581 453
15 2 208 208 brill~ 1 1 brillia~ 6 399 399
16 2 191 191 brill~ 2 1 salesman 6 399 399
I plan to apply the the process for many variables at a larger data, so would highly apprecaite your thoughts and help.
Upvotes: 0
Views: 83
Reputation: 607
Would grouping the data by Subject, Item_Number & block give the desired output?
colnames(d1)[6] <- "Item_Number" # delete the blank in the column name (otherwise it gives me an error...)
d2 <- d1 %>%
group_by(Subject,block, Item_Number) %>%
mutate(SumRT1 = sum(RT1),
SumRT2 = sum(RT2))
Output:
# A tibble: 16 x 10
# Groups: Subject, block, Item_Number [8]
Subject RT1 RT2 item ROI Item_Number IA_LABEL block SumRT1 SumRT2
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 567 261 brilliant salesman 1 1 brilliant 3 1061 337
2 1 494 76 brilliant salesman 2 1 salesman 3 1061 337
3 1 441 211 detailed brochure 1 2 detailed 3 985 493
4 1 544 282 detailed brochure 2 2 brochure 3 985 493
5 1 289 153 detailed brochure 1 2 detailed 4 430 294
6 1 141 141 detailed brochure 2 2 brochure 4 430 294
7 1 141 141 brilliant salesman 1 1 brilliant 4 404 231
8 1 263 90 brilliant salesman 2 1 salesman 4 404 231
9 2 216 216 detailed brochure 1 2 detailed 3 464 464
10 2 248 248 detailed brochure 2 2 brochure 3 464 464
11 2 216 128 brilliant salesman 1 1 brilliant 3 454 268
12 2 238 140 brilliant salesman 2 1 salesman 3 454 268
13 2 212 212 detailed brochure 1 2 detailed 6 581 453
14 2 369 241 detailed brochure 2 2 brochure 6 581 453
15 2 208 208 brilliant salesman 1 1 brilliant 6 399 399
16 2 191 191 brilliant salesman 2 1 salesman 6 399 399
Upvotes: 1