Reputation: 547
I am trying to use the Highest In, First Out accounting method on trades. Highest In, First Out means that when you sell, you sell your most expensive shares first.
Here are my buys and sells (example borrowed from R calculate aggregate gains or loss using FIFO method - this is a similar, but different problem):
buy = data.frame(BuyTransactionID = c(1:10),
Ticker=c(rep('MSFT',4),rep('AMZN',3),rep('DOCU',3)),
Date=c(rep('01-01-2018',2),rep('01-14-2020',2),rep('01-01-2018',2),rep('01-14-2020',1),'01-01-2018','03-15-2020','04-06-2020'),
Price=c(100,102,102,107,2000,2010,2011,197,182,167),
Quantity=c(10,10,5,5,1,1,2,12,15,15))
sell = data.frame(SellTransactionID=c(1:7),
Ticker=c('MSFT','MSFT','AMZN','AMZN','DOCU','DOCU','DOCU'),
Date=c('01-07-2020','01-20-2020','01-01-2020','01-30-2020','01-15-2020','04-10-2020','04-20-2020'),
Price=c(97,110,2100,2050,210,205,225),
Quantity=c(15,12,1,3,10,5,3))
Here are the rules:
Example problem:
The first sale (SellTransactionID = 1) is 15 shares of MSFT on 01-07-2020. So, any purchase made before that date can be sold. Based on date, the eligible shares to be sold are those from BuyTransactionID 1 and 2. BuyTransactionID 2 is the highest price. Therefore, all 10 shares of BuyTransactionID 2 are sold and the remaining 5 shares come from BuyTransactionID 1.
Desired output:
'Date Sold' = the date sold (self-explanatory);
'Ticker' = the ticker sold;
'Proceeds' = the total dollar amount sold;
'Cost basis' = a weighted average of the shares sold.
Example solution:
This is the solution for SellTransactionID 1. A properly solution automates this and calculates for all SellTransactionIDs.
result <- data.frame(SellDate = '01-07-2020', Ticker = "MSFT", Proceeds = 1455, CostBasis = 101.33)
Cost Basis Example:
Cost basis is calculated as a weighted average. For the preceding example, Cost Basis is calculated as such: (Quantity1 * Price1 + Quanity2 * Price2 + .....)/sum of all Quantity(s)
So for example above: (10 * 102 + 5 * 100)/15
Upvotes: 3
Views: 200
Reputation: 547
Here is the final working solution that I have come to with the help of @DPH. I have made a couple of changes to @DPH's edited solution.
dfh
object. The updates solution does work with the modified dateset that I provided but not the original dateset. I have modified the answer so that it works when all shares are sold.df[]
) to dplyr subsetting (i.e., df %>% filter()
. For some reason the base subsetting was resulting in rows with NA values in my actual dataset even though it did not cause that in the sample dataset. The NA rows caused the solution not to work.data frame prep:
df <- buy %>% filter(Ticker %in% unique(sell$Ticker)) %>% dplyr::mutate(io = "i") %>%
dplyr::rename(TID = BuyTransactionID) %>%
dplyr::union(dplyr::mutate(sell, io = "o") %>%
dplyr::rename(TID = SellTransactionID)) %>%
# sort the data
dplyr::arrange(Ticker, Date) %>%
# make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity),
TID = ifelse(io == "i", NA, TID),
Date = lubridate::mdy(Date),
hprice = NA_real_) %>%
dplyr::arrange(Ticker, Date) %>%
# group data to fill backwards per group
dplyr::group_by(Ticker) %>%
tidyr::fill(TID, .direction = "up") %>%
# ungroup to prevent unwanted behaviour downstream
dplyr::ungroup()
df$Dates_bought <- NA
function and purrr:
# rephrase loop as function
myfun <- function(i){
# which ticker are we working with at this sale
t <- unique(df[df$TID == i, ]$Ticker)
# bind data from last sale of this ticker to current sale
sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"),
df %>% filter(TID == i))
sdfh <- rbind(dfh %>% filter(Ticker == t),
df %>% filter(TID == i & io == "i") %>% select(c("Ticker", "Date", "Price", "Quantity", "io")))
# current sales quantiy as positive value
o1 <- abs(sdf %>% filter(io == "o") %>% pull(Quantity))
# copy to use for greedy algo
o2 <- o1
# vectors of price and qtd of bought shares at this sale, having price in decreasing order
ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
date <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Date)
ips <- ip
iqs <- iq
dates <- date
# total value of shares in greedy
v <- 0
# loop to run over bought prices and quantities do reduce from sold qtd per sales block
# check if shares block is larger then remaining sales qtd to break loop. Modifications
# to make dates work properly.
for(l in 1:length(ip)){
if(o2 < iq[l]){
v <- v + ip[l] * o2
iqs[1] <- iqs[1] - o2
dates2 <- if(o2 == 0) dates else dates[-1]
break
}else{
o2 <- o2 - iq[l]
v <- v + ip[l] * iq[l]
ips <- ips[-1]
iqs <- iqs[-1]
dates <- dates[-1]
dates2 <- dates
}
}
# Needs to have the if else statements because when all shares are sold, the length
# of dates, ips, and iqs is 0, whereas Ticker and io are length 1.
dfh <<- rbind(dfh[dfh$Ticker != t, ],
data.frame(Ticker = if(length(ips) == 0) numeric(length = 0L) else t,
Date = dates,
Price = ips,
Quantity = iqs,
io = if(length(ips) == 0) numeric(length = 0L) else "i"))
# fill sales block frame and bind to output df
dfo <<- rbind(dfo,
data.frame(TID = i,
Ticker = t,
Date = max(sdf$Date),
Dates_bought = paste(date[seq(length(date)-length(dates2))], collapse = ","),
Price = sum(ips * iqs) / sum(iqs),
Quantity = sum(iqs),
io = "i",
hprice = v / o1))
}
# empty copy of df
dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Date", "Price", "Quantity", "io")] # to hold current stock aka portfolio
hifo <- purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[length(unique(df$TID))]] %>%
dplyr::select(Ticker, Date, Dates_bought, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))
Let me know if anyone has any issues. I'd like to get this into a shinyapp and maybe develop it more. Let me know if you're interested in collaborating.
Upvotes: 1
Reputation: 4344
If I understood your problem correctly this is one possible solution. In resume I am combinig the sales and buys data and group it in sales blocks (given by the sales ID). This assumes that the order of sales IDs is according to the date column. I then loop over these sales blocks sequentially and write the intermediate result to a individual dataframe. For each sales block processing this result dataframe is filtered for the last sales block result of the same ticker. This means sales quantity must not be larger than available quantity according to the timeline (since you can not sell what you not have this should not be of concern anyhow I have to point it out as a possible limitation)
The proposed loop solution 1 is not the best way to work data in R since it is a loop, which grows a data.frame. Since you listed the purrr
tag I adapted the code for the second part of the answer to work with the map() function.
Before we get to the actual coding lets prepare the data first (need for both parts of the answer the same way):
library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame
# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>%
dplyr::rename(TID = BuyTransactionID) %>%
dplyr::union(dplyr::mutate(sell, io = "o") %>%
dplyr::rename(TID = SellTransactionID)) %>%
# sort the data
dplyr::arrange(Ticker, Date) %>%
# make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity),
TID = ifelse(io == "i", NA, TID),
Date = lubridate::mdy(Date),
hprice = NA_real_) %>%
# group data to fill backwards per group
dplyr::group_by(Ticker) %>%
tidyr::fill(TID, .direction = "up") %>%
# ungroup to prevent unwanted behaviour downstream
dplyr::ungroup()
1 Standard loop
dfo <- df[0, ] # empty copy of df
for (i in sort(unique(df$TID))) {
# which ticker are we working with at this sale
t <- unique(df[df$TID == i, ]$Ticker)
# bind data from last sale of this ticker to current sale
sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"),
df[df$TID == i, ])
# current sales quantiy as positive value
o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
# copy to use for greedy algo
o2 <- o1
# vectors of price and qtd of bought shares at this sale, having price in decreasing order
ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
# total value of shares in greedy
v <- 0
# loop to run over bought prices and quantities do reduce from sold qtd per sales block
# check if shares block is larger then remaining sales qtd to break loop
for(l in 1:length(ip)){
if(o2 < iq[l]){
v <- v + ip[l] * o2
break
}else{
o2 <- o2 - iq[l]
v <- v + ip[l] * iq[l]
}
}
# fill sales block frame and bind to output df
dfo <- rbind(dfo,
data.frame(TID = i,
Ticker = t,
Date = max(sdf$Date),
Price = (sum(ip * iq) - v) / sum(sdf$Quantity),
Quantity = sum(sdf$Quantity),
io = "i",
hprice = v / o1))
}
# format, join original data and calculate result per Sales block
dplyr::select(dfo, Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))
Ticker Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1 MSFT 2020-01-07 100.0000 5 101.3333 97 -15 -65
2 MSFT 2020-01-20 100.0000 3 103.7500 110 -12 75
3 AMZN 2020-01-01 2000.0000 1 2010.0000 2100 -1 90
4 AMZN 2020-01-30 NaN 0 2007.3333 2050 -3 128
5 DOCU 2020-01-15 197.0000 2 197.0000 210 -10 130
6 DOCU 2020-04-10 173.6667 27 188.0000 205 -5 85
7 DOCU 2020-04-20 0.0000 -1 131.3333 225 -3 281
2 loop rephrase as purrr solution (be aware of the global assignment operartor (<<- instead of <-) for assignment of dfo at end of function)
# rephrase loop as function
myfun <- function(i){
# which ticker are we working with at this sale
t <- unique(df[df$TID == i, ]$Ticker)
# bind data from last sale of this ticker to current sale
sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"),
df[df$TID == i, ])
# current sales quantiy as positive value
o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
# copy to use for greedy algo
o2 <- o1
# vectors of price and qtd of bought shares at this sale, having price in decreasing order
ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
# total value of shares in greedy
v <- 0
# loop to run over bought prices and quantities do reduce from sold qtd per sales block
# check if shares block is larger then remaining sales qtd to break loop
for(l in 1:length(ip)){
if(o2 < iq[l]){
v <- v + ip[l] * o2
break
}else{
o2 <- o2 - iq[l]
v <- v + ip[l] * iq[l]
}
}
# fill sales block frame and bind to output df
dfo <<- rbind(dfo,
data.frame(TID = i,
Ticker = t,
Date = max(sdf$Date),
Price = (sum(ip * iq) - v) / sum(sdf$Quantity),
Quantity = sum(sdf$Quantity),
io = "i",
hprice = v / o1))
}
# empty copy of df
dfo <- df[0, ]
purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[max(df$TID)]] %>%
dplyr::select(Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))
Ticker Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1 MSFT 2020-01-07 100.0000 5 101.3333 97 -15 -65
2 MSFT 2020-01-20 100.0000 3 103.7500 110 -12 75
3 AMZN 2020-01-01 2000.0000 1 2010.0000 2100 -1 90
4 AMZN 2020-01-30 NaN 0 2007.3333 2050 -3 128
5 DOCU 2020-01-15 197.0000 2 197.0000 210 -10 130
6 DOCU 2020-04-10 173.6667 27 188.0000 205 -5 85
7 DOCU 2020-04-20 0.0000 -1 131.3333 225 -3 281
EDIT
To keep track of the remaining stocks we need a second df to hold the current portfolio data. I did not optimize the code and editted only the loop, the purrr
adaption should be pretty straight foreward though.
library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame
# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>%
dplyr::rename(TID = BuyTransactionID) %>%
dplyr::union(dplyr::mutate(sell, io = "o") %>%
dplyr::rename(TID = SellTransactionID)) %>%
# sort the data
dplyr::arrange(Ticker, Date) %>%
# make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity),
TID = ifelse(io == "i", NA, TID),
Date = lubridate::mdy(Date),
hprice = NA_real_) %>%
# group data to fill backwards per group
dplyr::group_by(Ticker) %>%
tidyr::fill(TID, .direction = "up") %>%
# ungroup to prevent unwanted behaviour downstream
dplyr::ungroup()
dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Price", "Quantity", "io")] # to hold current stock aka portfolio
for (i in sort(unique(df$TID))) {
# which ticker are we working with at this sale
t <- unique(df[df$TID == i, ]$Ticker)
# bind data from last sale of this ticker to current sale
sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"),
df[df$TID == i, ])
# bind data from current portfolio to buys between last and current sale (new port folio before sale)
sdfh <- rbind(dfh[dfh$Ticker == t, ],
df[df$TID == i & df$io == "i", c("Ticker", "Price", "Quantity", "io")])
# current sales quantiy as positive value
o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
# copy to use for greedy algo
o2 <- o1
# vectors of price and qtd of bought shares at this sale, having price in decreasing order
ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
ips <- ip
iqs <- iq
# total value of shares in greedy
v <- 0
# loop to run over bought prices and quantities do reduce from sold qtd per sales block
# check if shares block is larger then remaining sales qtd to break loop
for(l in 1:length(ip)){
if(o2 < iq[l]){
v <- v + ip[l] * o2
iqs[1] <- iqs[1] - o2
break
}else{
o2 <- o2 - iq[l]
v <- v + ip[l] * iq[l]
ips <- ips[-1]
iqs <- iqs[-1]
}
}
dfh <- rbind(dfh[dfh$Ticker != t, ],
data.frame(Ticker = t,
Price = ips,
Quantity = iqs,
io = "i"))
# fill sales block frame and bind to output df
dfo <- rbind(dfo,
data.frame(TID = i,
Ticker = t,
Date = max(sdf$Date),
Price = sum(ips * iqs) / sum(iqs),
Quantity = sum(iqs),
io = "i",
hprice = v/o1))
}
dfo
TID Ticker Date Price Quantity io hprice
1 1 MSFT 2020-01-07 106.5652 69 i 115
2 2 MSFT 2020-01-20 105.0000 57 i 114
3 3 MSFT 2020-01-21 104.8750 56 i 112
4 4 MSFT 2020-01-22 104.1765 51 i 112
Upvotes: 1
Reputation: 547
The answer by @DPH is excellent, but unfortunately not quite accurate enough. I will explain why.
Here is a new dataset where all the purchases precede the sales:
buy = data.frame(BuyTransactionID = c(1:10),
Ticker=c(rep('MSFT',10)),
Date=c(rep('01-01-2020',10)),
Price=c(100,102,102,107,105,111,109,112,115,106),
Quantity=c(10,10,5,5,1,1,2,12,15,15))
sell = data.frame(SellTransactionID=c(1:4),
Ticker=c('MSFT','MSFT','MSFT', 'MSFT'),
Date=c('01-07-2020','01-20-2020','01-21-2020',
'01-22-2020'),
Price=c(120,119,117, 121),
Quantity=c(7,12,1, 5))
If you apply the solution from @DPH, you will get this result:
Notice that the 'Remain_Price' does not change, nor does the 'Sales_Cost' for the last three transactions. This happens because the function determines how many shares remain after the first sale and what the average price of the remaining shares is. The shares purchased preceding the first sale can no longer be sold individually. They are now treated as a single entity with an average price and the remaining number of shares.
For example, a total of 76 shares were bought in this example. The first sale sells 7 shares. Now, 69 shares remain as seen in 'Remain_Qtd'. An average price is calculated for those remaining shares - that price is $106.5652. Now, the process considers all 69 shares to be priced at $106.5652 and the remaining sales reduce the quantity of 'Remain_Qtd', but does not change the 'Remain_Price'. The remaining shares can no longer be considered at the price that they were bought at, they are collectively part of the remaining shares and the average remaining price.
This occurs because of the object dfo
and the recycling of dfo
in the object sdf
. In particular, this line calculates an average remaining price that is then recycled through dfo
and sdf
.
Price = (sum(ip * iq) - v) / sum(sdf$Quantity)
and Quantity = sum(sdf$Quantity)
adds together all the remaining shares.
I think the answer by @DPH is brilliant, but hope that it can be modified to treat each purchase individually rather than aggregating past purchases.
Upvotes: 1