Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Summarise by distinct character values

I have this kind of table (reproducible below):

# A tibble: 11 x 6
    Line Speaker Utterance Gaze_by AOI     dur
   <int> <chr>   <chr>     <chr>   <chr> <int>
 1     1 A       hi there  A       C       100
 2     1 A       hi there  A       *        25
 3     1 A       hi there  A       B        30
 4     1 A       hi there  A       *        50
 5     1 A       hi there  A       C       144
 6     2 B       how're ya B       *        50
 7     2 B       how're ya B       A        22
 8     2 B       how're ya B       *        33
 9     2 B       how're ya B       A       100
10     2 B       how're ya B       *       150
11     2 B       how're ya B       C       999

I need to sum the dur values grouped by Speaker, Gaze_by, and AOI and calculate the relative proportions. Specifically, I want to summarise the grouped dur values where AOI has a capital letter A, B, or C as opposed to *. That latter part is what I'm asking help for. What I can do is summarise by all/any AOI value:

d %>%
  group_by(Speaker, Utterance, Gaze_by, AOI) %>%
  summarise(dur = sum(dur)) %>%
  mutate(prop = dur/sum(dur)*100)
# A tibble: 6 x 6
# Groups:   Speaker, Utterance, Gaze_by [2]
  Speaker Utterance Gaze_by AOI     dur  prop
  <chr>   <chr>     <chr>   <chr> <int> <dbl>
1 A       hi there  A       *        75 21.5 
2 A       hi there  A       B        30  8.60
3 A       hi there  A       C       244 69.9 
4 B       how're ya B       *       233 17.2 
5 B       how're ya B       A       122  9.01
6 B       how're ya B       C       999 73.8 

But what I'm really looking for is this:

  Speaker Utterance Gaze_by AOI     dur  prop
  <chr>   <chr>     <chr>   <chr> <int> <dbl>
1 A       hi there  A       *        75 21.5 
2 A       hi there  A       X       274 78.5       # X instead of A B, or C
3 B       how're ya B       *       233 17.2 
4 B       how're ya B       X      1122 82.8       # X instead of A B, or C

How can this be accomplished?

Reproducible data:

structure(list(Line = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L), Speaker = c("A", "A", "A", "A", "A", "B", "B", "B", "B", 
"B", "B"), Utterance = c("hi there", "hi there", "hi there", 
"hi there", "hi there", "how're ya", "how're ya", "how're ya", 
"how're ya", "how're ya", "how're ya"), Gaze_by = c("A", "A", 
"A", "A", "A", "B", "B", "B", "B", "B", "B"), AOI = c("C", "*", 
"B", "*", "C", "*", "A", "*", "A", "*", "C"), dur = c(100L, 25L, 
30L, 50L, 144L, 50L, 22L, 33L, 100L, 150L, 999L)), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 34

Answers (1)

Karthik S
Karthik S

Reputation: 11584

Does this work:

df %>% group_by(Speaker, Gaze_by, 'AOI' = str_replace(AOI,'\\w','X')) %>% summarise(dur = sum(dur)) %>%
                                                            mutate(prop = dur/sum(dur)*100)
`summarise()` has grouped output by 'Speaker', 'Gaze_by'. You can override using the `.groups` argument.
# A tibble: 4 x 5
# Groups:   Speaker, Gaze_by [2]
  Speaker Gaze_by AOI     dur  prop
  <chr>   <chr>   <chr> <int> <dbl>
1 A       A       *        75  21.5
2 A       A       X       274  78.5
3 B       B       *       233  17.2
4 B       B       X      1121  82.8

Upvotes: 1

Related Questions