Reputation: 465
I am working on some portfolio data and I'm stumped by this data manipulation. I have this sample data
df <- tibble(
date = as.Date(c("2020-01-31", "2020-01-31", "2020-01-31",
"2020-02-29", "2020-02-29", "2020-02-29",
"2020-03-31", "2020-03-31", "2020-03-31") ),
id = c("KO", "AAPL", "MSFT",
"KO", "AAPL", "GOOG",
"KO", "AAPL", "MSFT"),
weight = c(0.3, 0.4, 0.3,
0.5, 0.3, 0.2,
0.6, 0.2, 0.2),
`weight_change (desired column)` = c(NA, NA, NA,
0.2, -0.1, 0.2,
0.1, -0.1, 0.2)
)
These are the positions in a sample portfolio. The portfolio gets new weights every month. What I want to calculate is the change in weight for each item in terms of the previous months weight. In this example we see that at the end of February, KO's current weight is 0.5 which is up 0.2 from the previous month. AAPL is down 0.1, while GOOG replaces MSFT so the change with the previous month is its entire current weight: 0.2. How can I set up a mutate such that it looks for the stock in the previous date and calculates the difference between the weights?
Upvotes: 0
Views: 56
Reputation: 2384
A timetk
approach:
library(timetk)
df %>%
mutate(Month = lubridate::floor_date(date, "month")) %>%
group_by(id) %>%
timetk::pad_by_time(.date_var = Month, .by="month") %>%
select(-Month) %>%
mutate(WC = if(n() == 1) weight else c(NA, diff(weight)))
A tibble: 10 x 5
Groups: id [4]
id date weight weight_change WC
<chr> <date> <dbl> <dbl> <dbl>
1 KO 2020-01-31 0.3 NA NA
2 KO 2020-02-29 0.5 0.2 0.2
3 KO 2020-03-31 0.6 0.1 0.100
4 AAPL 2020-01-31 0.4 NA NA
5 AAPL 2020-02-29 0.3 -0.1 -0.1
6 AAPL 2020-03-31 0.2 -0.1 -0.100
7 MSFT 2020-01-31 0.3 NA NA
8 MSFT NA NA NA NA
9 MSFT 2020-03-31 0.2 0.2 NA
10 GOOG 2020-02-29 0.2 0.2 0.2
Upvotes: 1
Reputation: 21
Here is my not so compact solution. I just use some helper columns, which I leave in so that one can follow.
library(tidyverse)
library(lubridate)
df <- tibble(
date = c("2020-01-31", "2020-01-31", "2020-01-31",
"2020-02-29", "2020-02-29", "2020-02-29",
"2020-03-31", "2020-03-31", "2020-03-31"),
id = c("KO", "AAPL", "MSFT", "KO", "AAPL", "GOOG", "KO", "AAPL", "MSFT"),
weight = c(0.3, 0.4, 0.3, 0.5, 0.3, 0.2, 0.6, 0.2, 0.2),
`weight_change (desired_column)` = c(NA, NA, NA, 0.2, -0.1, 0.2, 0.1, -0.1, 0.2)
) %>% #new code starts here
mutate(
date = as_date(date),
date_ym = floor_date(date,
unit = "month"))%>%
group_by(id)%>%
arrange(date)%>%
mutate(id_n = row_number(),
prev_exist = case_when(lag(date_ym) == date_ym - months(1) ~ "immediate month", #if there is an immediate month
id_n == 1 & date != min(df$date)~ "new month", #if this is a new month
TRUE ~ "no immediate month"),
weight_change = case_when(prev_exist == "new month"~ weight,
prev_exist == "no immediate month" & id_n > 1~ weight,
TRUE ~ weight-lag(weight)),
date_ym = NULL,
id_n = NULL,
prev_exist = NULL)
Upvotes: 1
Reputation: 887008
If the data is monthly for each 'id', we can do a complete
to take account of the missing months, then do a group by diff
library(dplyr)
library(tidyr)
library(zoo)
df %>%
mutate(yearmonth = as.Date(as.yearmon(date))) %>%
group_by(id) %>%
complete(yearmonth = seq(first(yearmonth), last(yearmonth), by = '1 month')) %>%
mutate(weight_change = if(n() == 1) weight else c(NA, diff(replace_na(weight, 0)))) %>%
ungroup %>%
select(names(df), weight_change) %>%
filter(!is.na(date))
# A tibble: 9 x 5
# date id weight `weight_change (desired column)` weight_change
# <date> <chr> <dbl> <dbl> <dbl>
#1 2020-01-31 AAPL 0.4 NA NA
#2 2020-02-29 AAPL 0.3 -0.1 -0.1
#3 2020-03-31 AAPL 0.2 -0.1 -0.100
#4 2020-02-29 GOOG 0.2 0.2 0.2
#5 2020-01-31 KO 0.3 NA NA
#6 2020-02-29 KO 0.5 0.2 0.2
#7 2020-03-31 KO 0.6 0.1 0.100
#8 2020-01-31 MSFT 0.3 NA NA
#9 2020-03-31 MSFT 0.2 0.2 0.2
Upvotes: 2