Reputation: 161
I am fairly new to R and DPLYR and I am stuck on a this issue:
I have two tables:
(1) Repairs done on cars
(2) Amount owed on each car over time
What I would like to do is create three extra columns on the repair table that gives me: (1) the amount owed on the car when the repair was done, (2) 3months down the road and (3) finally last payment record on file.
And if the case where the repair date does not match with any payment record, I need to use the closest amount owed on record.
So something like:
Any ideas how I can do that?
Here are the data frames:
Repairs done on cars:
df_repair <- data.frame(unique_id =
c("A1","A2","A3","A4","A5","A6","A7","A8"),
car_number = c(1,1,1,2,2,2,3,3),
repair_done = c("Front Fender","Front
Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front
Lights","Front Fender"),
YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))
df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),
amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
Upvotes: 1
Views: 106
Reputation: 30494
Using zoo
for year-months, and tidyverse
, you could try the following. Using left_join
add all the df_owed
data to your df_repair
data, by the car_number
. You can convert your year-month columns to yearmon
objects with zoo
. Then, sort your rows by the year-month column from df_owed
.
For each unique_id
(using group_by
) you can create your three columns of interest. The first will use the latest amount_owed
where the owed date is prior to the service date. Then second (3 months) will use the first amount_owed
value where the owed date follows the service date by 3 months (3/12). Finally, the most recent take just the last
value from amount_owed
.
Using the example data, the results differ a bit, possibly due to the data frames not matching the images in the post.
library(tidyverse)
library(zoo)
df_repair %>%
left_join(df_owed, by = "car_number") %>%
mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
arrange(YearMonth.y) %>%
group_by(unique_id, car_number) %>%
summarise(
owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
owed_most_recent = last(amount_owed)
)
Upvotes: 1