How to sort data frame into groups based multiple criteria

Question

I am analyzing some stock data and already calculated an illiquidity measure for each stock (cit). I now want to create portfolios from my stocks based on this illiquidity measure for each month in my data set.

I have already looked into the split()-function but have found no way to split my data for each month.

This is a sample of my data set. I have got a large data set behind this sample (22.000.000 rows containing 30 years of stock data).

         date cit
1  1990-01-01   1
2  1990-01-01   2
3  1990-01-01   3
4  1990-01-01   4
5  1990-01-01   5
6  1990-01-01   6
7  1990-01-01   7
8  1990-01-01   8
9  1990-01-01   9
10 1990-01-01  10
11 1990-02-01  11
12 1990-02-01  12
13 1990-02-01  13
14 1990-02-01  14
15 1990-02-01  15
16 1990-02-01  16
17 1990-02-01  17
18 1990-02-01  18
19 1990-02-01  19
20 1990-02-01  20

I want to create portfolios of my stock data for each month and based on the cit-measure, e.g. from lowest to highest cit, with the average cit computed for each portfolio. The expected outcome looks like this:

         date  cit portfolio
1  1990-01-01  1.5         1
2  1990-01-01  3.5         2
3  1990-01-01  5.5         3
4  1990-01-01  7.5         4
5  1990-01-01  9.5         5
6  1990-01-01 11.5         1
7  1990-01-01 13.5         2
8  1990-01-01 15.5         3
9  1990-01-01 17.5         4
10 1990-01-01 19.5         5

Any help is greatly appreciated. Thank you so much!

Andre&#233; · Accepted Answer

There is nothing pretty about this solution but maybe it helps, the lubridate package is useful to extract the month to split the data.

library(tidyverse);library(magrittr);library(lubridate)
df <- list("date" = c(rep("1990-01-01",10),rep("1990-02-01",10)),
              "cit"=1:20) %>% as.data.frame()
df$date %<>% as.Date()

df.res <-
    split(df, df$date) %>%
        map_df(~{
            .x %>% 
            mutate(portfolio = cut(cit, breaks = 5, labels = 1:5)) %>%
            group_by(portfolio) %>% 
            summarise(meanCit = mean(cit)) %>%
            transmute(year= year(.x$date)[1],
                   month= month(.x$date)[1],
                   meanCit,
                   portfolio)
            })

df.res

How to sort data frame into groups based multiple criteria

Answers (1)

Related Questions