Reputation: 21400
I have this kind of table (reproducible below):
# A tibble: 11 x 6
Line Speaker Utterance Gaze_by AOI dur
<int> <chr> <chr> <chr> <chr> <int>
1 1 A hi there A C 100
2 1 A hi there A * 25
3 1 A hi there A B 30
4 1 A hi there A * 50
5 1 A hi there A C 144
6 2 B how're ya B * 50
7 2 B how're ya B A 22
8 2 B how're ya B * 33
9 2 B how're ya B A 100
10 2 B how're ya B * 150
11 2 B how're ya B C 999
I need to sum the dur
values grouped by Speaker
, Gaze_by
, and AOI
and calculate the relative proportions. Specifically, I want to summarise the grouped dur
values where AOI
has a capital letter A
, B
, or C
as opposed to *
. That latter part is what I'm asking help for. What I can do is summarise by all/any AOI
value:
d %>%
group_by(Speaker, Utterance, Gaze_by, AOI) %>%
summarise(dur = sum(dur)) %>%
mutate(prop = dur/sum(dur)*100)
# A tibble: 6 x 6
# Groups: Speaker, Utterance, Gaze_by [2]
Speaker Utterance Gaze_by AOI dur prop
<chr> <chr> <chr> <chr> <int> <dbl>
1 A hi there A * 75 21.5
2 A hi there A B 30 8.60
3 A hi there A C 244 69.9
4 B how're ya B * 233 17.2
5 B how're ya B A 122 9.01
6 B how're ya B C 999 73.8
But what I'm really looking for is this:
Speaker Utterance Gaze_by AOI dur prop
<chr> <chr> <chr> <chr> <int> <dbl>
1 A hi there A * 75 21.5
2 A hi there A X 274 78.5 # X instead of A B, or C
3 B how're ya B * 233 17.2
4 B how're ya B X 1122 82.8 # X instead of A B, or C
How can this be accomplished?
Reproducible data:
structure(list(Line = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), Speaker = c("A", "A", "A", "A", "A", "B", "B", "B", "B",
"B", "B"), Utterance = c("hi there", "hi there", "hi there",
"hi there", "hi there", "how're ya", "how're ya", "how're ya",
"how're ya", "how're ya", "how're ya"), Gaze_by = c("A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "B"), AOI = c("C", "*",
"B", "*", "C", "*", "A", "*", "A", "*", "C"), dur = c(100L, 25L,
30L, 50L, 144L, 50L, 22L, 33L, 100L, 150L, 999L)), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 34
Reputation: 11584
Does this work:
df %>% group_by(Speaker, Gaze_by, 'AOI' = str_replace(AOI,'\\w','X')) %>% summarise(dur = sum(dur)) %>%
mutate(prop = dur/sum(dur)*100)
`summarise()` has grouped output by 'Speaker', 'Gaze_by'. You can override using the `.groups` argument.
# A tibble: 4 x 5
# Groups: Speaker, Gaze_by [2]
Speaker Gaze_by AOI dur prop
<chr> <chr> <chr> <int> <dbl>
1 A A * 75 21.5
2 A A X 274 78.5
3 B B * 233 17.2
4 B B X 1121 82.8
Upvotes: 1