Reputation: 666
So I have two dataframes, in a tidy format:
df1 <- data.frame(date=as.Date(paste0('2018-12-',c(11,15,18,22,25,29))), balance=c(-500,-250,0,250,-300,500), account='salary')
df2 <- data.frame(date=as.Date(paste0('2018-12-',c(16,22,27))), balance=c(1000, 700, 250), account='budget')
Now, this is balance of my budget account and my daily account. the sum of the "balance" column from both thee dataframes would give the amount of money I have on any given day.
However, as there is only a row in the dataframe if a transfer that changes the balance is made, it complicates the computation. The merge has to be done, so for every time there is a row in the one dataframe, there has to be row in the other dataframe, that correponds to the balance in that account on that day. So the result in the toy example would be this:
df.result <- data.frame(date=as.Date(paste0('2018-12-',c(11,15,16,18,22,25,27,29))), balance.salary=c(-500,-250,-250,0,250,-300,-300,500), balance.budget=c(1000,1000,1000,1000,700,700,250,250))
Notice how even though I don't have information for the budget-account from the first date that the salay-account has a row, I'm using the information from the first time there is a row from the budget account.
here I have changed the column names for the balance-variable, so that one row can have the balance for both, but this is not the essential part of the solution, only that the result can be computed like this:
df.result$balance.total <- df.result$balance.salary + df.result$balance.budget
I have tried using crossing() as per this answer, Copying row from one df into everyone row in another, but isn't useful in this case, as far as I can tell.
Thank you.
PS I prefer a non-tidyverse solution - I'm a big fan of data.table myself - but beggars can't be choosers :)
Upvotes: 4
Views: 223
Reputation: 389175
A mixture of base R and zoo
solution. We first merge
both the dataframe by date
and fill the missing values using na.locf
.
library(zoo)
na.locf(na.locf(merge(df1[-3], df2[-3], all = TRUE, by = "date")), fromLast = TRUE)
# date balance.x balance.y
#1 2018-12-11 -500 1000
#2 2018-12-15 -250 1000
#3 2018-12-16 -250 1000
#4 2018-12-18 0 1000
#5 2018-12-22 250 700
#6 2018-12-25 -300 700
#7 2018-12-27 -300 250
#8 2018-12-29 500 250
Upvotes: 1
Reputation: 40151
I see that you prefer data.table
, but maybe a tidyverse
solution will be also useful:
df1 %>%
rename(balance.salary = balance) %>%
select(-account) %>%
full_join(df2 %>%
rename(balance.budget = balance) %>%
select(-account), by = c("date" = "date")) %>%
arrange(date) %>%
fill(balance.salary, .direction = "down") %>%
fill(balance.budget, .direction = "down") %>%
fill(balance.budget, .direction = "up")
date balance.salary balance.budget
1 2018-12-11 -500 1000
2 2018-12-15 -250 1000
3 2018-12-16 -250 1000
4 2018-12-18 0 1000
5 2018-12-22 250 700
6 2018-12-25 -300 700
7 2018-12-27 -300 250
8 2018-12-29 500 250
First, is renames the "balance" column in df1 to "balance.salary" and to "balance.budget" in df2. Second, it merges the two dfs on "date". Finally, it fills the missing values with the last non-NA value.
Upvotes: 4