Reputation: 69
I need to create a new variable that's based on information from an observation further down in the data set that meets a specific criteria. The criteria could be met in the next observation or could be several rows down.
I'm a beginner in R and haven't been able to make any progress on the solution.
I have a data frame (df) with the following variables:
event event time
pass 10.10
failed block 10.20
failed check 10.21
reception 10.25
pass 17.60
reception 17.65
I need to create a variable called reception time that returns the time of the reception for each pass, so it looks like:
event event time reception time
pass 10.10 10.25
failed block 10.20 NA
failed check 10.21 NA
reception 10.25 NA
pass 17.60 17.65
reception 17.65 NA
There could be 50 or more lines in between pass and reception.
Upvotes: 2
Views: 138
Reputation: 562
If I understand your data correctly, adding a grouping variable (e.g. event_n
) could be helpful for further analyses.
If reception
is always the last occurence before a new series of event, you could use the last()
function from dplyr
.
library(dplyr)
df <- tribble(
~event, ~event_time,
"pass", 10.10,
"failed block",10.20,
"failed check",10.21,
"reception", 10.25,
"pass", 17.60,
"reception", 17.65)
df2 <- df %>%
group_by(event) %>%
mutate(event_n = sequence(n())) %>%
ungroup()
df2
#> # A tibble: 6 x 3
#> event event_time event_n
#> <chr> <dbl> <int>
#> 1 pass 10.1 1
#> 2 failed block 10.2 1
#> 3 failed check 10.2 1
#> 4 reception 10.2 1
#> 5 pass 17.6 2
#> 6 reception 17.6 2
df2 %>%
group_by(event_n) %>%
mutate(reception = if_else(event == "pass", last(event_time), NA_real_))
#> # A tibble: 6 x 4
#> # Groups: event_n [2]
#> event event_time event_n reception
#> <chr> <dbl> <int> <dbl>
#> 1 pass 10.1 1 10.2
#> 2 failed block 10.2 1 NA
#> 3 failed check 10.2 1 NA
#> 4 reception 10.2 1 NA
#> 5 pass 17.6 2 17.6
#> 6 reception 17.6 2 NA
Created on 2019-08-08 by the reprex package (v0.3.0)
Does this answer work with your data?
Upvotes: 1
Reputation: 2937
You may achieve what you need using the which
function of base
R
, assuming two things:
pass
(i.e., a pass
occurs before a reception
)reception
that follows at a later point in time applies to the previous pass
If that is the case (i.e., if not, provide more details), then this should do:
# Define variables.
event <- as.factor(c("p", "fb", "fc", "r", "p", "r"))
time <- c(10.10, 10.20, 10.21, 10.25, 17.60, 17.65)
# Create data frame.
data <- data.frame(event, time)
data
# event time
# 1 p 10.10
# 2 fb 10.20
# 3 fc 10.21
# 4 r 10.25
# 5 p 17.60
# 6 r 17.65
# Create result column.
data$reception <- NA
data
# event time reception
# 1 p 10.10 NA
# 2 fb 10.20 NA
# 3 fc 10.21 NA
# 4 r 10.25 NA
# 5 p 17.60 NA
# 6 r 17.65 NA
# Compute.
data$reception[which(data$event == "p")] <- data$time[which(data$event == "r")]
data
# event time reception
# 1 p 10.10 10.25
# 2 fb 10.20 NA
# 3 fc 10.21 NA
# 4 r 10.25 NA
# 5 p 17.60 17.65
# 6 r 17.65 NA
Upvotes: 1
Reputation: 988
Welcome to Stack!
This is a bit klunky but it works for your example.
df1
event event.time
1 pass 10.10
2 failed block 10.20
3 failed check 10.21
4 reception 10.25
5 pass 17.60
6 reception 17.65
Utilizing the packages dplyr
for the pipes and tidyr
for fill
Code
library(dplyr)
library(tidyr)
df2 <- df1 %>%
mutate(reception.time = ifelse(event == "reception", event.time, NA)) %>%
fill(reception.time, .direction = "up") %>%
mutate(reception.time = ifelse(event == "pass", reception.time, NA)
Output
df2
event event.time reception.time
1 pass 10.10 10.25
2 failed block 10.20 NA
3 failed check 10.21 NA
4 reception 10.25 NA
5 pass 17.60 17.65
6 reception 17.65 NA
Data
dput(df1)
df1 <- structure(list(event = c("pass", "failed block", "failed check",
"reception", "pass", "reception"), event.time = c(10.1, 10.2,
10.21, 10.25, 17.6, 17.65)), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1