Mutate if multiple values occur in additional rows together

Question

I have a table of bank account transactions and I want to add a column flagging any transactions which are refunds (both the initial payment and the money coming back in) but for the life of me I can't figure out how to match the rows.

Sample data:

transactions <- tibble(date = as.Date(c("2020-01-01", "2020-01-02", "2020-01-02", "2020-01-03", "2020-01-04")),
                       merchant = c("A", "B", "B", "C", "A"),
                       amount = c(10, 5, -5, 9, 10))

Desired output:

# A tibble: 5 x 4
  date       merchant amount refund
              
1 2020-01-01 A            10 FALSE 
2 2020-01-02 B             5 TRUE  
3 2020-01-02 B            -5 TRUE  
4 2020-01-03 C             9 FALSE 
5 2020-01-04 A            10 FALSE

At the minute I'm only looking to match payments where the refund is on the same day, but I might change it later to include refunds that occurred any day after the payment. My first thought was to use something like below but it doesn't ensure that the values all occur in the same row. I'd prefer a tidyverse answer.

mutate(refund = if_else(date %in% date & 
                             merchant %in% merchant & 
                             (amount*-1) %in% amount,
                        T, F))

stlba · Accepted Answer

I had a brainwave while doing something else, it's a longer process than I was envisioning but posting my working solution here for completeness.

First I added a unique ID for each row (in reality my data already has this)

transactions <- tibble(date = as.Date(c("2020-01-01", "2020-01-02", "2020-01-02", "2020-01-03", "2020-01-04")),
                       merchant = c("A", "B", "B", "C", "A"),
                       amount = c(10, 5, -5, 9, 10)) %>%
     mutate(id = seq_len(nrow(.)))

Split the dataframe into two, depending on if money is going out (positive value) or out (negative) of the account. For money in, I mutated the amount column to be positive values instead so they can be matched.

transactions_out <- transactions %>%
     filter(amount > 0)

transactions_in <- transactions %>%
     filter(amount < 0) %>%
     mutate(amount = amount*-1)

Use inner_join() to match rows with the same date, merchant and amount in both rows. Then use pivot_longer() and pull() to save the IDs of these transactions.

In my actual data I've added extra lines here to refine the matches to reduce false positives.

refund_ids <- inner_join(transactions_out, transactions_in,
           by = c("date", "merchant", "amount"),
           suffix = c("_out", "_in")) %>%
     pivot_longer(starts_with("id"),
                  names_to = "type",
                  values_to = "id") %>%
     pull(id)

Finally, I add the new refund column to the original dataframe for the IDs in refund_ids:

transactions <- transactions %>%
     mutate(refund = if_else(id %in% refund_ids, T, F))

Output:

> transactions

# A tibble: 5 x 5
  date       merchant amount    id refund
               
1 2020-01-01 A            10     1 FALSE 
2 2020-01-02 B             5     2 TRUE  
3 2020-01-02 B            -5     3 TRUE  
4 2020-01-03 C             9     4 FALSE 
5 2020-01-04 A            10     5 FALSE

Mutate if multiple values occur in additional rows together

Answers (2)

Solution

Output

Related Questions