Reputation: 69

Create variable based on an observation further down in data set in R

I need to create a new variable that's based on information from an observation further down in the data set that meets a specific criteria. The criteria could be met in the next observation or could be several rows down.

I'm a beginner in R and haven't been able to make any progress on the solution.

I have a data frame (df) with the following variables:

event        event time  
pass            10.10  
failed block    10.20
failed check    10.21  
reception       10.25
pass            17.60
reception       17.65

I need to create a variable called reception time that returns the time of the reception for each pass, so it looks like:

event         event time   reception time  
pass            10.10          10.25  
failed block    10.20            NA
failed check    10.21            NA  
reception       10.25            NA  
pass            17.60          17.65  
reception       17.65            NA

There could be 50 or more lines in between pass and reception.

Upvotes: 2

Answers (3)

b_surial

Reputation: 562

If I understand your data correctly, adding a grouping variable (e.g. event_n) could be helpful for further analyses.

If reception is always the last occurence before a new series of event, you could use the last() function from dplyr.

library(dplyr)

df <- tribble(
  ~event,        ~event_time,  
  "pass",        10.10,
  "failed block",10.20,
  "failed check",10.21,
  "reception",   10.25,
  "pass",        17.60,
  "reception",   17.65)

df2 <- df %>% 
  group_by(event) %>% 
  mutate(event_n = sequence(n())) %>% 
  ungroup()

df2
#> # A tibble: 6 x 3
#>   event        event_time event_n
#>   <chr>             <dbl>   <int>
#> 1 pass               10.1       1
#> 2 failed block       10.2       1
#> 3 failed check       10.2       1
#> 4 reception          10.2       1
#> 5 pass               17.6       2
#> 6 reception          17.6       2

df2 %>% 
  group_by(event_n) %>% 
  mutate(reception = if_else(event == "pass", last(event_time), NA_real_))
#> # A tibble: 6 x 4
#> # Groups:   event_n [2]
#>   event        event_time event_n reception
#>   <chr>             <dbl>   <int>     <dbl>
#> 1 pass               10.1       1      10.2
#> 2 failed block       10.2       1      NA  
#> 3 failed check       10.2       1      NA  
#> 4 reception          10.2       1      NA  
#> 5 pass               17.6       2      17.6
#> 6 reception          17.6       2      NA

^{Created on 2019-08-08 by the reprex package (v0.3.0)}

Does this answer work with your data?

Upvotes: 1

Mihai

Reputation: 2937

You may achieve what you need using the which function of base R, assuming two things:

that your dataframe always starts with a pass (i.e., a pass occurs before a reception)
every reception that follows at a later point in time applies to the previous pass

If that is the case (i.e., if not, provide more details), then this should do:

# Define variables.
event <- as.factor(c("p", "fb", "fc", "r", "p", "r"))
time <- c(10.10, 10.20, 10.21, 10.25, 17.60, 17.65)

# Create data frame.
data <- data.frame(event, time)
data

#   event  time
# 1     p 10.10
# 2    fb 10.20
# 3    fc 10.21
# 4     r 10.25
# 5     p 17.60
# 6     r 17.65

# Create result column.
data$reception <- NA
data

#   event  time reception
# 1     p 10.10        NA
# 2    fb 10.20        NA
# 3    fc 10.21        NA
# 4     r 10.25        NA
# 5     p 17.60        NA
# 6     r 17.65        NA

# Compute.
data$reception[which(data$event == "p")] <- data$time[which(data$event == "r")]
data

#   event  time reception
# 1     p 10.10     10.25
# 2    fb 10.20        NA
# 3    fc 10.21        NA
# 4     r 10.25        NA
# 5     p 17.60     17.65
# 6     r 17.65        NA

Upvotes: 1

Russ Thomas

Reputation: 988

Welcome to Stack!

This is a bit klunky but it works for your example.

df1

         event event.time
1         pass      10.10
2 failed block      10.20
3 failed check      10.21
4    reception      10.25
5         pass      17.60
6    reception      17.65

Utilizing the packages dplyr for the pipes and tidyr for fill

Code

library(dplyr)
library(tidyr)

df2 <- df1 %>% 
  mutate(reception.time = ifelse(event == "reception", event.time, NA)) %>% 
  fill(reception.time, .direction = "up") %>% 
  mutate(reception.time = ifelse(event == "pass", reception.time, NA)

Output

df2

         event event.time reception.time
1         pass      10.10          10.25
2 failed block      10.20             NA
3 failed check      10.21             NA
4    reception      10.25             NA
5         pass      17.60          17.65
6    reception      17.65             NA

Data

dput(df1)

df1 <- structure(list(event = c("pass", "failed block", "failed check", 
"reception", "pass", "reception"), event.time = c(10.1, 10.2, 
10.21, 10.25, 17.6, 17.65)), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Create variable based on an observation further down in data set in R

Answers (3)

Related Questions