Reputation: 2152
Example data:
library(tidyverse)
cars <- tribble(
~make, ~year, ~hp, ~disp,
"Volvo", 2000, 111, 160,
"Volvo", 2004, 105, 161,
"Volvo", 2006, 93, 108,
"Honda", 2000, 110, 258,
"Honda", 2003, 175, 360,
"Honda", 2003, 105, 225,
"Honda", 2006, 245, 360,
"Ford", 2000, 62, 146,
"Ford", 2005, 95, 140,
"Ford", 2007, 123, 167,
"Ford", 2009, 123, 167
)
For each row, I want to add the columns prev_hp
and prev_disp
,
which should contain the hp
and disp
values from from the same make
, 2-4 years prior. The result should look like this:
tribble(
~make, ~year, ~hp, ~disp, ~prev_hp, ~prev_disp,
"Volvo", 2000, 111, 160, NA, NA,
"Volvo", 2004, 105, 161, 111, 160,
"Volvo", 2006, 93, 108, 105, 161,
"Honda", 2000, 110, 258, NA, NA,
"Honda", 2003, 175, 360, 110, 258,
"Honda", 2004, 105, 225, 110, 360,
"Honda", 2006, 245, 360, 175*, 225*,
"Ford", 2000, 62, 146, NA, NA,
"Ford", 2005, 95, 140, NA, NA,
"Ford", 2007, 123, 167, 95, 140,
"Ford", 2009, 123, 167, 123, 167
)
* For this row, two previous rows matches the criteria (Honda in 2003 and in 2004). In this case, the values from either match can be used, or even NA.
I've tried this:
cars %>%
rowwise() %>%
mutate(
prev_hp = cars[cars$make == make & (cars$year >= year - 4 | cars$year <= year - 2), 'hp'],
prev_disp = cars[cars$make == make & (cars$year >= year - 4 | cars$year <= year - 2), 'disp']
)
but it fails when there are more or less than exactly one matching previous row, and it's also very inefficient.
Is there a better way to solve this, or at least some way to fix my solution?
(the real dataset is 100k rows)
Upvotes: 1
Views: 44
Reputation: 21938
I'm not sure if this is the output you are looking for. Let's take a look:
library(dplyr)
cars <- tribble(
~make, ~year, ~hp, ~disp,
"Volvo", 2000, 111, 160,
"Volvo", 2004, 105, 161,
"Volvo", 2006, 93, 108,
"Honda", 2000, 110, 258,
"Honda", 2003, 175, 360,
"Honda", 2003, 105, 225,
"Honda", 2006, 245, 360,
"Ford", 2000, 62, 146,
"Ford", 2005, 95, 140,
"Ford", 2007, 123, 167,
"Ford", 2009, 123, 167
)
cars %>%
group_by(make) %>%
mutate(across(hp:disp, ~ lag(.x), .names = "prev_{.col}"),
across(prev_hp:prev_disp, ~ ifelse(year - lag(year) > 4, NA, .x)))
# A tibble: 11 x 6
# Groups: make [3]
make year hp disp prev_hp prev_disp
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Volvo 2000 111 160 NA NA
2 Volvo 2004 105 161 111 160
3 Volvo 2006 93 108 105 161
4 Honda 2000 110 258 NA NA
5 Honda 2003 175 360 110 258
6 Honda 2003 105 225 175 360
7 Honda 2006 245 360 105 225
8 Ford 2000 62 146 NA NA
9 Ford 2005 95 140 NA NA
10 Ford 2007 123 167 95 140
11 Ford 2009 123 167 123 167
Upvotes: 2