Ljupcho Naumov
Ljupcho Naumov

Reputation: 465

Compute month on month difference in weights

I am working on some portfolio data and I'm stumped by this data manipulation. I have this sample data

df <- tibble(
  date = as.Date(c("2020-01-31", "2020-01-31", "2020-01-31", 
                   "2020-02-29", "2020-02-29", "2020-02-29",
                   "2020-03-31", "2020-03-31", "2020-03-31") ),
  id = c("KO", "AAPL", "MSFT",
         "KO", "AAPL", "GOOG", 
         "KO", "AAPL", "MSFT"),
  weight = c(0.3, 0.4, 0.3,
             0.5, 0.3, 0.2,
             0.6, 0.2, 0.2),
  
  `weight_change (desired column)` = c(NA, NA, NA,
                                       0.2, -0.1, 0.2,
                                       0.1, -0.1, 0.2)
) 

These are the positions in a sample portfolio. The portfolio gets new weights every month. What I want to calculate is the change in weight for each item in terms of the previous months weight. In this example we see that at the end of February, KO's current weight is 0.5 which is up 0.2 from the previous month. AAPL is down 0.1, while GOOG replaces MSFT so the change with the previous month is its entire current weight: 0.2. How can I set up a mutate such that it looks for the stock in the previous date and calculates the difference between the weights?

Upvotes: 0

Views: 56

Answers (3)

teofil
teofil

Reputation: 2384

A timetk approach:

library(timetk)
df %>% 
   mutate(Month = lubridate::floor_date(date, "month")) %>%
   group_by(id) %>% 
   timetk::pad_by_time(.date_var = Month, .by="month") %>% 
   select(-Month) %>% 
   mutate(WC = if(n() == 1) weight else c(NA, diff(weight)))

A tibble: 10 x 5
Groups:   id [4]
   id    date       weight weight_change     WC
   <chr> <date>      <dbl>         <dbl>  <dbl>
 1 KO    2020-01-31    0.3          NA   NA    
 2 KO    2020-02-29    0.5           0.2  0.2  
 3 KO    2020-03-31    0.6           0.1  0.100
 4 AAPL  2020-01-31    0.4          NA   NA    
 5 AAPL  2020-02-29    0.3          -0.1 -0.1  
 6 AAPL  2020-03-31    0.2          -0.1 -0.100
 7 MSFT  2020-01-31    0.3          NA   NA    
 8 MSFT  NA           NA            NA   NA    
 9 MSFT  2020-03-31    0.2           0.2 NA    
10 GOOG  2020-02-29    0.2           0.2  0.2

Upvotes: 1

AlanKinene
AlanKinene

Reputation: 21

Here is my not so compact solution. I just use some helper columns, which I leave in so that one can follow.

library(tidyverse)
library(lubridate)

df <- tibble(
  date = c("2020-01-31", "2020-01-31", "2020-01-31", 
                   "2020-02-29", "2020-02-29", "2020-02-29",
                   "2020-03-31", "2020-03-31", "2020-03-31"),
  id = c("KO", "AAPL", "MSFT", "KO", "AAPL", "GOOG", "KO", "AAPL", "MSFT"),
  weight = c(0.3, 0.4, 0.3, 0.5, 0.3, 0.2, 0.6, 0.2, 0.2),
  `weight_change (desired_column)` = c(NA, NA, NA, 0.2, -0.1, 0.2, 0.1, -0.1, 0.2)
) %>% #new code starts here
  mutate(
    date = as_date(date),
    date_ym = floor_date(date,
                         unit = "month"))%>%
  group_by(id)%>%
  arrange(date)%>%
  mutate(id_n = row_number(),
         prev_exist = case_when(lag(date_ym) == date_ym - months(1) ~ "immediate month", #if there is an immediate month
                                id_n == 1 & date != min(df$date)~ "new month", #if this is a new month
                                TRUE ~ "no immediate month"),
         weight_change = case_when(prev_exist == "new month"~ weight,
                                   prev_exist == "no immediate month" & id_n > 1~ weight,
                                   TRUE ~ weight-lag(weight)),
         date_ym = NULL,
         id_n  = NULL,
         prev_exist = NULL)

Upvotes: 1

akrun
akrun

Reputation: 887008

If the data is monthly for each 'id', we can do a complete to take account of the missing months, then do a group by diff

library(dplyr)
library(tidyr)
library(zoo)    
df %>%
     mutate(yearmonth = as.Date(as.yearmon(date))) %>%
     group_by(id) %>% 
     complete(yearmonth = seq(first(yearmonth), last(yearmonth), by = '1 month')) %>%
     mutate(weight_change = if(n() == 1) weight else c(NA, diff(replace_na(weight, 0)))) %>%
     ungroup %>%
     select(names(df), weight_change) %>%
     filter(!is.na(date))
# A tibble: 9 x 5
#  date       id    weight `weight_change (desired column)` weight_change
#  <date>     <chr>  <dbl>                            <dbl>         <dbl>
#1 2020-01-31 AAPL     0.4                             NA          NA    
#2 2020-02-29 AAPL     0.3                             -0.1        -0.1  
#3 2020-03-31 AAPL     0.2                             -0.1        -0.100
#4 2020-02-29 GOOG     0.2                              0.2         0.2  
#5 2020-01-31 KO       0.3                             NA          NA    
#6 2020-02-29 KO       0.5                              0.2         0.2  
#7 2020-03-31 KO       0.6                              0.1         0.100
#8 2020-01-31 MSFT     0.3                             NA          NA    
#9 2020-03-31 MSFT     0.2                              0.2         0.2  

Upvotes: 2

Related Questions