nateroe
nateroe

Reputation: 547

Calculating Highest In, First Out on trades

I am trying to use the Highest In, First Out accounting method on trades. Highest In, First Out means that when you sell, you sell your most expensive shares first.

Here are my buys and sells (example borrowed from R calculate aggregate gains or loss using FIFO method - this is a similar, but different problem):

buy = data.frame(BuyTransactionID = c(1:10),
                 Ticker=c(rep('MSFT',4),rep('AMZN',3),rep('DOCU',3)),
                 Date=c(rep('01-01-2018',2),rep('01-14-2020',2),rep('01-01-2018',2),rep('01-14-2020',1),'01-01-2018','03-15-2020','04-06-2020'),
                 Price=c(100,102,102,107,2000,2010,2011,197,182,167),
                 Quantity=c(10,10,5,5,1,1,2,12,15,15))

sell = data.frame(SellTransactionID=c(1:7),
                  Ticker=c('MSFT','MSFT','AMZN','AMZN','DOCU','DOCU','DOCU'),
                  Date=c('01-07-2020','01-20-2020','01-01-2020','01-30-2020','01-15-2020','04-10-2020','04-20-2020'),
                  Price=c(97,110,2100,2050,210,205,225),
                  Quantity=c(15,12,1,3,10,5,3))

Here are the rules:

  1. You sell the most expensive (highest price) shares first.
  2. You cannot sell shares before you purchased them
  3. You cannot sell the same shares multiple times

Example problem:

The first sale (SellTransactionID = 1) is 15 shares of MSFT on 01-07-2020. So, any purchase made before that date can be sold. Based on date, the eligible shares to be sold are those from BuyTransactionID 1 and 2. BuyTransactionID 2 is the highest price. Therefore, all 10 shares of BuyTransactionID 2 are sold and the remaining 5 shares come from BuyTransactionID 1.

Desired output:

'Date Sold' = the date sold (self-explanatory);

'Ticker' = the ticker sold;

'Proceeds' = the total dollar amount sold;

'Cost basis' = a weighted average of the shares sold.

Example solution:

This is the solution for SellTransactionID 1. A properly solution automates this and calculates for all SellTransactionIDs.

result <- data.frame(SellDate = '01-07-2020', Ticker = "MSFT", Proceeds = 1455, CostBasis = 101.33)

Cost Basis Example:

Cost basis is calculated as a weighted average. For the preceding example, Cost Basis is calculated as such: (Quantity1 * Price1 + Quanity2 * Price2 + .....)/sum of all Quantity(s)

So for example above: (10 * 102 + 5 * 100)/15

Upvotes: 3

Views: 200

Answers (3)

nateroe
nateroe

Reputation: 547

Here is the final working solution that I have come to with the help of @DPH. I have made a couple of changes to @DPH's edited solution.

  1. The edited solution does not work when all the shares of a stock are sold for multiple reasons including the dfh object. The updates solution does work with the modified dateset that I provided but not the original dateset. I have modified the answer so that it works when all shares are sold.
  2. I have modified the result to include the dates of the purchases. This will be important for determining whether the sale is long term or short term capital gains.
  3. I have removed tickers from stocks that have been purchased but not sold, as those will break the script
  4. I have applied to the updated solution to avoid looping.
  5. I have changed the base subsetting (i.e., df[]) to subsetting (i.e., df %>% filter(). For some reason the base subsetting was resulting in rows with NA values in my actual dataset even though it did not cause that in the sample dataset. The NA rows caused the solution not to work.

data frame prep:

df <- buy %>% filter(Ticker %in% unique(sell$Ticker)) %>% dplyr::mutate(io = "i") %>% 
  dplyr::rename(TID  = BuyTransactionID) %>% 
  dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                 dplyr::rename(TID = SellTransactionID)) %>%
  # sort the data
  dplyr::arrange(Ticker, Date) %>% 
  # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
  dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                TID = ifelse(io == "i", NA, TID),
                Date = lubridate::mdy(Date),
                hprice = NA_real_) %>% 
  dplyr::arrange(Ticker, Date) %>% 
  # group data to fill backwards per group
  dplyr::group_by(Ticker) %>% 
  tidyr::fill(TID, .direction = "up") %>%
  # ungroup to prevent unwanted behaviour downstream
  dplyr::ungroup()

df$Dates_bought <- NA

function and :

# rephrase loop as function
myfun <- function(i){
  # which ticker are we working with at this sale
  t <- unique(df[df$TID == i, ]$Ticker)
  # bind data from last sale of this ticker to current sale
  sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
               df %>% filter(TID == i))
  
  sdfh <- rbind(dfh %>% filter(Ticker == t),
                df %>% filter(TID == i & io == "i") %>% select(c("Ticker", "Date", "Price", "Quantity",  "io")))
  # current sales quantiy as positive value
  o1 <- abs(sdf %>% filter(io == "o") %>% pull(Quantity))
  # copy to use for greedy algo
  o2 <- o1
  # vectors of price and qtd of bought shares at this sale, having price in decreasing order
  ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
  iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
  date <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Date)
  ips <- ip
  iqs <- iq
  dates <- date
  # total value of shares in greedy
  v <- 0
  # loop to run over bought prices and quantities do reduce from sold qtd per sales block
  # check if shares block is larger then remaining sales qtd to break loop. Modifications 
  # to make dates work properly. 
  for(l in 1:length(ip)){
    if(o2 < iq[l]){
      v <- v + ip[l] * o2
      iqs[1] <- iqs[1] - o2
      dates2 <- if(o2 == 0) dates else dates[-1]
      break
    }else{
      o2 <- o2 - iq[l]
      v <- v + ip[l] * iq[l] 
      ips <- ips[-1]
      iqs <- iqs[-1]
      dates <- dates[-1]
      dates2 <- dates
    }
  }
  # Needs to have the if else statements because when all shares are sold, the length
  # of dates, ips, and iqs is 0, whereas Ticker and io are length 1. 
  dfh <<- rbind(dfh[dfh$Ticker != t, ],
                data.frame(Ticker = if(length(ips) == 0) numeric(length = 0L) else t,
                           Date = dates,
                           Price = ips, 
                           Quantity = iqs, 
                           io = if(length(ips) == 0) numeric(length = 0L) else "i"))
  
  # fill sales block frame and bind to output df
  dfo <<- rbind(dfo,
                data.frame(TID = i,
                           Ticker = t, 
                           Date = max(sdf$Date),
                           Dates_bought = paste(date[seq(length(date)-length(dates2))], collapse = ","),
                           Price = sum(ips * iqs) / sum(iqs), 
                           Quantity = sum(iqs), 
                           io = "i", 
                           hprice = v / o1))
}


# empty copy of df
dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Date", "Price", "Quantity", "io")] # to hold current stock aka portfolio

hifo <- purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[length(unique(df$TID))]] %>% 
  dplyr::select(Ticker, Date, Dates_bought, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
  dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
  dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

Let me know if anyone has any issues. I'd like to get this into a shinyapp and maybe develop it more. Let me know if you're interested in collaborating.

Upvotes: 1

DPH
DPH

Reputation: 4344

If I understood your problem correctly this is one possible solution. In resume I am combinig the sales and buys data and group it in sales blocks (given by the sales ID). This assumes that the order of sales IDs is according to the date column. I then loop over these sales blocks sequentially and write the intermediate result to a individual dataframe. For each sales block processing this result dataframe is filtered for the last sales block result of the same ticker. This means sales quantity must not be larger than available quantity according to the timeline (since you can not sell what you not have this should not be of concern anyhow I have to point it out as a possible limitation)

The proposed loop solution 1 is not the best way to work data in R since it is a loop, which grows a data.frame. Since you listed the purrr tag I adapted the code for the second part of the answer to work with the map() function.

Before we get to the actual coding lets prepare the data first (need for both parts of the answer the same way):

library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame

# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>% 
    dplyr::rename(TID  = BuyTransactionID) %>% 
    dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                     dplyr::rename(TID = SellTransactionID)) %>% 
    # sort the data
    dplyr::arrange(Ticker, Date) %>% 
    # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
    dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                  TID = ifelse(io == "i", NA, TID),
                  Date = lubridate::mdy(Date),
                  hprice = NA_real_) %>% 
    # group data to fill backwards per group
    dplyr::group_by(Ticker) %>% 
    tidyr::fill(TID, .direction = "up") %>%
    # ungroup to prevent unwanted behaviour downstream
    dplyr::ungroup()

1 Standard loop

dfo <- df[0, ] # empty copy of df

for (i in sort(unique(df$TID))) {
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
        }
    } 
    # fill sales block frame and bind to output df
    dfo <- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = (sum(ip * iq) - v) / sum(sdf$Quantity), 
                            Quantity = sum(sdf$Quantity), 
                            io = "i", 
                            hprice = v / o1))
}

# format, join original data and calculate result per Sales block
dplyr::select(dfo, Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
    dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
    dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

  Ticker       Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1   MSFT 2020-01-07     100.0000          5   101.3333          97       -15    -65
2   MSFT 2020-01-20     100.0000          3   103.7500         110       -12     75
3   AMZN 2020-01-01    2000.0000          1  2010.0000        2100        -1     90
4   AMZN 2020-01-30          NaN          0  2007.3333        2050        -3    128
5   DOCU 2020-01-15     197.0000          2   197.0000         210       -10    130
6   DOCU 2020-04-10     173.6667         27   188.0000         205        -5     85
7   DOCU 2020-04-20       0.0000         -1   131.3333         225        -3    281

2 loop rephrase as purrr solution (be aware of the global assignment operartor (<<- instead of <-) for assignment of dfo at end of function)

# rephrase loop as function
myfun <- function(i){
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
        }
    } 
    # fill sales block frame and bind to output df
    dfo <<- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = (sum(ip * iq) - v) / sum(sdf$Quantity), 
                            Quantity = sum(sdf$Quantity), 
                            io = "i", 
                            hprice = v / o1))
}


# empty copy of df
dfo <- df[0, ]

purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[max(df$TID)]] %>% 
    dplyr::select(Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
    dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
    dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

  Ticker       Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1   MSFT 2020-01-07     100.0000          5   101.3333          97       -15    -65
2   MSFT 2020-01-20     100.0000          3   103.7500         110       -12     75
3   AMZN 2020-01-01    2000.0000          1  2010.0000        2100        -1     90
4   AMZN 2020-01-30          NaN          0  2007.3333        2050        -3    128
5   DOCU 2020-01-15     197.0000          2   197.0000         210       -10    130
6   DOCU 2020-04-10     173.6667         27   188.0000         205        -5     85
7   DOCU 2020-04-20       0.0000         -1   131.3333         225        -3    281

EDIT

To keep track of the remaining stocks we need a second df to hold the current portfolio data. I did not optimize the code and editted only the loop, the purrr adaption should be pretty straight foreward though.

library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame

# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>% 
    dplyr::rename(TID  = BuyTransactionID) %>% 
    dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                     dplyr::rename(TID = SellTransactionID)) %>% 
    # sort the data
    dplyr::arrange(Ticker, Date) %>% 
    # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
    dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                  TID = ifelse(io == "i", NA, TID),
                  Date = lubridate::mdy(Date),
                  hprice = NA_real_) %>% 
    # group data to fill backwards per group
    dplyr::group_by(Ticker) %>% 
    tidyr::fill(TID, .direction = "up") %>%
    # ungroup to prevent unwanted behaviour downstream
    dplyr::ungroup() 

dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Price", "Quantity", "io")] # to hold current stock aka portfolio

for (i in sort(unique(df$TID))) {
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # bind data from current portfolio to buys between last and current sale (new port folio before sale)
    sdfh <- rbind(dfh[dfh$Ticker == t, ],
                  df[df$TID == i & df$io == "i", c("Ticker", "Price", "Quantity", "io")])
              
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    ips <- ip
    iqs <- iq
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            iqs[1] <- iqs[1] - o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
            ips <- ips[-1]
            iqs <- iqs[-1]
        }
    } 
    dfh <- rbind(dfh[dfh$Ticker != t, ],
                 data.frame(Ticker = t, 
                            Price = ips, 
                            Quantity = iqs, 
                            io = "i"))
    # fill sales block frame and bind to output df
    dfo <- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = sum(ips * iqs) / sum(iqs), 
                            Quantity = sum(iqs), 
                            io = "i",
                            hprice = v/o1))
}

dfo
  TID Ticker       Date    Price Quantity io hprice
1   1   MSFT 2020-01-07 106.5652       69  i    115
2   2   MSFT 2020-01-20 105.0000       57  i    114
3   3   MSFT 2020-01-21 104.8750       56  i    112
4   4   MSFT 2020-01-22 104.1765       51  i    112

Upvotes: 1

nateroe
nateroe

Reputation: 547

The answer by @DPH is excellent, but unfortunately not quite accurate enough. I will explain why.

Here is a new dataset where all the purchases precede the sales:

buy = data.frame(BuyTransactionID = c(1:10),
                 Ticker=c(rep('MSFT',10)),
                 Date=c(rep('01-01-2020',10)),
                 Price=c(100,102,102,107,105,111,109,112,115,106),
                 Quantity=c(10,10,5,5,1,1,2,12,15,15))


sell = data.frame(SellTransactionID=c(1:4),
                  Ticker=c('MSFT','MSFT','MSFT', 'MSFT'),
                  Date=c('01-07-2020','01-20-2020','01-21-2020', 
                  '01-22-2020'),
                  Price=c(120,119,117, 121),
                  Quantity=c(7,12,1, 5))

If you apply the solution from @DPH, you will get this result: enter image description here

Notice that the 'Remain_Price' does not change, nor does the 'Sales_Cost' for the last three transactions. This happens because the function determines how many shares remain after the first sale and what the average price of the remaining shares is. The shares purchased preceding the first sale can no longer be sold individually. They are now treated as a single entity with an average price and the remaining number of shares.

For example, a total of 76 shares were bought in this example. The first sale sells 7 shares. Now, 69 shares remain as seen in 'Remain_Qtd'. An average price is calculated for those remaining shares - that price is $106.5652. Now, the process considers all 69 shares to be priced at $106.5652 and the remaining sales reduce the quantity of 'Remain_Qtd', but does not change the 'Remain_Price'. The remaining shares can no longer be considered at the price that they were bought at, they are collectively part of the remaining shares and the average remaining price.

This occurs because of the object dfo and the recycling of dfo in the object sdf. In particular, this line calculates an average remaining price that is then recycled through dfo and sdf.

Price = (sum(ip * iq) - v) / sum(sdf$Quantity)

and Quantity = sum(sdf$Quantity) adds together all the remaining shares.

I think the answer by @DPH is brilliant, but hope that it can be modified to treat each purchase individually rather than aggregating past purchases.

Upvotes: 1

Related Questions