Reputation: 21432
I have this dataframe; specifically, the column phase
distinguishes alphabetical groups with values from A
through E
and the column duration
contains the durations of these phase
s:
df
speaker action duration phase
39 ID2-A <no: yeah: it 's:>= 0.872 <NA>
40 ID1-G ((m: r hand holds up three fingers ifo face)) 1.769 A # group 1
41 ID1-P ((m: r hand holds up three fingers ifo face)) prep 0.679 B # group 1
42 ID1-P ((m: r hand holds up three fingers ifo face)) stroke 0.399 C # group 1
56 ID1-A °>that 's crazy<° anyway= 1.514 <NA>
57 ID1-G ((m: r hand airslaps)) 0.938 A # group 2
58 ID1-P ((m: r hand airslaps)) prep 0.299 B # group 2
59 ID1-P ((m: r hand airslaps)) stroke 0.261 C # group 2
60 ID1-P ((m: r hand airslaps)) relax 0.374 E # group 2
61 <NA> (0.057) 0.057 <NA>
62 ID2-A =yeah >I don 't know< 0.582 <NA>
I want to create new columns prep
, stroke
, hold
, and relax
, filled, where available, with the duration
values for phase
B
, C
, D
(not shown in the example), and E
all put side by side on the same row as the respective A
values. The expected output is this:
df
speaker action duration phase prep stroke hold relax
39 ID2-A <no: yeah: it 's:>= 0.872 <NA> NA NA NA NA
40 ID1-G ((m: r hand holds up three fingers ifo face)) 1.769 A 0.679 0.399 NA NA
41 ID1-P ((m: r hand holds up three fingers ifo face)) prep 0.679 B NA NA NA NA
42 ID1-P ((m: r hand holds up three fingers ifo face)) stroke 0.399 C NA NA NA NA
56 ID1-A °>that 's crazy<° anyway= 1.514 <NA> NA NA NA NA
57 ID1-G ((m: r hand airslaps)) 0.938 A 0.299 0.261 NA 0.374
58 ID1-P ((m: r hand airslaps)) prep 0.299 B NA NA NA NA
59 ID1-P ((m: r hand airslaps)) stroke 0.261 C NA NA NA NA
60 ID1-P ((m: r hand airslaps)) relax 0.374 E NA NA NA NA
61 <NA> (0.057) 0.057 <NA> NA NA NA NA
62 ID2-A =yeah >I don 't know< 0.582 <NA> NA NA NA NA
To get there I've created the new columns using ifelse
:
df$prep <- ifelse(df$phase=="B", df$duration, NA)
df$stroke <- ifelse(df$phase=="C", df$duration, NA)
df$hold <- ifelse(df$phase=="D", df$duration, NA)
df$relax <- ifelse(df$phase=="E", df$duration, NA)
This works fine. The problematic part is the transfer of the duration
values to the A
rows. I've tried using lead
, for example:
library(dplyr)
df$prep <- lead(df$prep, 1)
The issue here is that the number of positions to lead can vary if not all 5 phases A
through E
are present per group. For example, if the D
phase is missing as in the example (cf. rows 59-60), then the number of positions to lead the duration
value associated with phase E
is not 4
but 3
.
Any suggestion for how to solve this?
Reproducible data:
df <- dput(t[c(39:42,56:62), c(2:3,5:6)])
structure(list(speaker = c("ID2-A", "ID1-G", "ID1-P", "ID1-P",
"ID1-A", "ID1-G", "ID1-P", "ID1-P", "ID1-P", NA, "ID2-A"), action = c(" <no: yeah: it 's:>=",
" ((m: r hand holds up three fingers ifo face))", " ((m: r hand holds up three fingers ifo face)) prep",
" ((m: r hand holds up three fingers ifo face)) stroke", " °>that 's crazy<° anyway=",
" ((m: r hand airslaps))", " ((m: r hand airslaps)) prep",
" ((m: r hand airslaps)) stroke", " ((m: r hand airslaps)) relax",
"(0.057)", " =yeah >I don 't know<"), duration = c(0.872, 1.769,
0.679, 0.399, 1.514, 0.938, 0.299, 0.261, 0.374, 0.057, 0.582
), phase = c(NA, "A", "B", "C", NA, "A", "B", "C", "E", NA, NA
)), row.names = c(39L, 40L, 41L, 42L, 56L, 57L, 58L, 59L, 60L,
61L, 62L), class = "data.frame")
EDIT:
The phasegroups
are not always neatly separated by NA
, as in this example:
df
speaker action duration phase
29 <NA> canceled 3.672 <NA>
30 ID1-G ((m: r hand imitates throwing away)) 1.478 A
31 ID1-P ((m: r hand imitates throwing away)) prep 0.254 B
32 ID1-P ((m: r hand imitates throwing away)) stroke 0.775 C
33 ID1-P ((m: r hand imitates throwing away)) hold 0.450 D
34 ID1-G ((m: r hand nods)) 1.584 A
35 ID1-P ((m: r hand nods)) prep 0.466 B
36 ID1-P ((m: r hand nods)) stroke 0.324 C
37 ID1-P ((m: r hand nods)) relax 0.785 E
38 <NA> (0.071) 0.071 <NA>
Upvotes: 0
Views: 61
Reputation: 56219
Using reshaping data long-to-wide pivot_wider
, then merge left_join
:
library(tidyverse)
# add a group column
x <- df %>%
mutate(grp = cumsum(is.na(phase)),
grp = ifelse(is.na(phase), NA, grp))
# reshape
y <- x %>%
select(grp, phase, duration) %>%
filter(!is.na(grp)) %>%
pivot_wider(id_cols = grp, names_from = phase, values_from = duration) %>%
mutate(phase = "A")
# merge
left_join(x, y, by = c("phase", "grp"))
# speaker action duration phase grp A B C E
# 1 ID2-A <no: yeah: it 's:>= 0.872 <NA> NA NA NA NA NA
# 2 ID1-G ((m: r hand holds up three fingers ifo face)) 1.769 A 1 1.769 0.679 0.399 NA
# 3 ID1-P ((m: r hand holds up three fingers ifo face)) prep 0.679 B 1 NA NA NA NA
# 4 ID1-P ((m: r hand holds up three fingers ifo face)) stroke 0.399 C 1 NA NA NA NA
# 5 ID1-A °>that 's crazy<° anyway= 1.514 <NA> NA NA NA NA NA
# 6 ID1-G ((m: r hand airslaps)) 0.938 A 2 0.938 0.299 0.261 0.374
# 7 ID1-P ((m: r hand airslaps)) prep 0.299 B 2 NA NA NA NA
# 8 ID1-P ((m: r hand airslaps)) stroke 0.261 C 2 NA NA NA NA
# 9 ID1-P ((m: r hand airslaps)) relax 0.374 E 2 NA NA NA NA
# 10 <NA> (0.057) 0.057 <NA> NA NA NA NA NA
# 11 ID2-A =yeah >I don 't know< 0.582 <NA> NA NA NA NA NA
Upvotes: 2