emilBeBri
emilBeBri

Reputation: 666

Tricky merge with dates in r - expand rows from one df to match another

So I have two dataframes, in a tidy format:

  df1 <- data.frame(date=as.Date(paste0('2018-12-',c(11,15,18,22,25,29))), balance=c(-500,-250,0,250,-300,500), account='salary')
  df2 <- data.frame(date=as.Date(paste0('2018-12-',c(16,22,27))), balance=c(1000, 700, 250), account='budget')

Now, this is balance of my budget account and my daily account. the sum of the "balance" column from both thee dataframes would give the amount of money I have on any given day.

However, as there is only a row in the dataframe if a transfer that changes the balance is made, it complicates the computation. The merge has to be done, so for every time there is a row in the one dataframe, there has to be row in the other dataframe, that correponds to the balance in that account on that day. So the result in the toy example would be this:

  df.result <- data.frame(date=as.Date(paste0('2018-12-',c(11,15,16,18,22,25,27,29))), balance.salary=c(-500,-250,-250,0,250,-300,-300,500), balance.budget=c(1000,1000,1000,1000,700,700,250,250))

Notice how even though I don't have information for the budget-account from the first date that the salay-account has a row, I'm using the information from the first time there is a row from the budget account.

here I have changed the column names for the balance-variable, so that one row can have the balance for both, but this is not the essential part of the solution, only that the result can be computed like this:

  df.result$balance.total <- df.result$balance.salary + df.result$balance.budget

I have tried using crossing() as per this answer, Copying row from one df into everyone row in another, but isn't useful in this case, as far as I can tell.

Thank you.

PS I prefer a non-tidyverse solution - I'm a big fan of data.table myself - but beggars can't be choosers :)

Upvotes: 4

Views: 223

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389175

A mixture of base R and zoo solution. We first merge both the dataframe by date and fill the missing values using na.locf.

library(zoo)
na.locf(na.locf(merge(df1[-3], df2[-3], all = TRUE, by = "date")), fromLast = TRUE)


#        date balance.x balance.y
#1 2018-12-11      -500      1000
#2 2018-12-15      -250      1000
#3 2018-12-16      -250      1000
#4 2018-12-18         0      1000
#5 2018-12-22       250       700
#6 2018-12-25      -300       700
#7 2018-12-27      -300       250
#8 2018-12-29       500       250

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 40151

I see that you prefer data.table, but maybe a tidyverse solution will be also useful:

df1 %>%
 rename(balance.salary = balance) %>%
 select(-account) %>%
 full_join(df2 %>%
            rename(balance.budget = balance) %>%
            select(-account), by = c("date" = "date")) %>%
 arrange(date) %>%
 fill(balance.salary, .direction = "down") %>%
 fill(balance.budget, .direction = "down") %>%
 fill(balance.budget, .direction = "up")

        date balance.salary balance.budget
1 2018-12-11           -500           1000
2 2018-12-15           -250           1000
3 2018-12-16           -250           1000
4 2018-12-18              0           1000
5 2018-12-22            250            700
6 2018-12-25           -300            700
7 2018-12-27           -300            250
8 2018-12-29            500            250

First, is renames the "balance" column in df1 to "balance.salary" and to "balance.budget" in df2. Second, it merges the two dfs on "date". Finally, it fills the missing values with the last non-NA value.

Upvotes: 4

Related Questions