Max
Max

Reputation: 117

Number of remaining days of a month after maximum value appear

I have a panel data frame like this

date    firms return
5/1/1988    A   5
6/1/1988    A   6
7/1/1988    A   4
8/1/1988    A   5
9/1/1988    A   6
11/1/1988   A   6
12/1/1988   A   13
13/01/1988  A   3
14/01/1988  A   2
15/01/1988  A   5
16/01/1988  A   2
18/01/1988  A   7
19/01/1988  A   3
20/01/1988  A   5
21/01/1988  A   7
22/01/1988  A   5
23/01/1988  A   9
25/01/1988  A   1
26/01/1988  A   5
27/01/1988  A   2
28/01/1988  A   7
29/01/1988  A   2
5/1/1988    B   5
6/1/1988    B   7
7/1/1988    B   5
8/1/1988    B   9
9/1/1988    B   1
11/1/1988   B   5
12/1/1988   B   2
13/01/1988  B   7
14/01/1988  B   2
15/01/1988  B   5
16/01/1988  B   6
18/01/1988  B   8
19/01/1988  B   5
20/01/1988  B   4
21/01/1988  B   3
22/01/1988  B   18
23/01/1988  B   5
25/01/1988  B   2
26/01/1988  B   7
27/01/1988  B   3
28/01/1988  B   9
29/01/1988  B   2

Now from the above panel data, I want to find a variable called DMAX. DMAX means the unit of days as the difference between the Maximum return day and the last trading day of the same month. For example, in January 1988 the Maximum return appears on 12 Jan 1988 for firm A. Hence the DMAX is the number of days between 12 Jan 1988 to the end of that month which is 15 days. For firm B, the maximum value appears on 22 Jan 1988. So the remaining number of days of that month is 6 days. Therefore the expected outcome is

date    Firms   DMAX(days)
Jan-88    A      15
Jan-88    B      6

I would be grateful if you can help me in this regard.

Upvotes: 3

Views: 62

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269905

1) Base R For each year/month and firm aggregate the difference between the number of rows and the position of the maximum return row. No packages are used.

with(transform(DF, date = as.Date(date, "%d/%m/%Y")), 
  aggregate(list(DMAX = return), 
    data.frame(date = format(date, "%Y-%m"), firms),
    function(x) length(x) - which.max(x)))

giving:

     date firms DMAX
1 1988-01     A   15
2 1988-01     B    6

2) zoo Read DF into a zoo object zd with one column per firm and then aggregate that by year/month. Finally melt it to a long form data frame using fortify.zoo. The fortify.zoo line can be omitted if a zoo time series object is ok as the result.

library(zoo)

zd <- read.zoo(DF, index = "date", format = "%d/%m/%Y", split = "firms")
ag <- aggregate(zd, as.yearmon, function(x) length(na.omit(x)) - which.max(na.omit(x)))
fortify.zoo(ag, melt = TRUE)

giving:

     Index Series Value
1 Jan 1988      A    15
2 Jan 1988      B     6

Note that ag is a monthly zoo series of the form:

> ag
          A B
Jan 1988 15 6

3) data.table

library(data.table)

DT <- as.data.table(DF)
DT[, list(DMAX = .N - which.max(return)), 
       by = list(date = format(as.Date(date, "%d/%m/%Y"), "%Y-%m"), firms)]

giving:

      date firms DMAX
1: 1988-01     A   15
2: 1988-01     B    6

Note

Lines <- "
date    firms return
5/1/1988    A   5
6/1/1988    A   6
7/1/1988    A   4
8/1/1988    A   5
9/1/1988    A   6
11/1/1988   A   6
12/1/1988   A   13
13/01/1988  A   3
14/01/1988  A   2
15/01/1988  A   5
16/01/1988  A   2
18/01/1988  A   7
19/01/1988  A   3
20/01/1988  A   5
21/01/1988  A   7
22/01/1988  A   5
23/01/1988  A   9
25/01/1988  A   1
26/01/1988  A   5
27/01/1988  A   2
28/01/1988  A   7
29/01/1988  A   2
5/1/1988    B   5
6/1/1988    B   7
7/1/1988    B   5
8/1/1988    B   9
9/1/1988    B   1
11/1/1988   B   5
12/1/1988   B   2
13/01/1988  B   7
14/01/1988  B   2
15/01/1988  B   5
16/01/1988  B   6
18/01/1988  B   8
19/01/1988  B   5
20/01/1988  B   4
21/01/1988  B   3
22/01/1988  B   18
23/01/1988  B   5
25/01/1988  B   2
26/01/1988  B   7
27/01/1988  B   3
28/01/1988  B   9
29/01/1988  B   2
"
DF <- read.table(text = Lines, header = TRUE)

Upvotes: 0

jazzurro
jazzurro

Reputation: 23574

One way using the dplyr package would be the following. I called your data mydf. First, manipulate date. Then, group the data by date and firms. Then, you look for the row with the largest value in return and handle subtraction.

mutate(mydf, date = format(as.Date(date, format = "%d/%m/%Y"), "%m-%Y")) %>%
group_by(date, firms) %>%
summarize(DMAX = n() - which.max(return))

# A tibble: 2 x 3
# Groups:   date [?]
#  date    firms  DMAX
#  <chr>   <fct> <int>
#1 01-1988 A        15
#2 01-1988 B         6

DATA

mydf <-structure(list(date = structure(c(18L, 19L, 20L, 21L, 22L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L), .Label = c("11/1/1988", 
"12/1/1988", "13/01/1988", "14/01/1988", "15/01/1988", "16/01/1988", 
"18/01/1988", "19/01/1988", "20/01/1988", "21/01/1988", "22/01/1988", 
"23/01/1988", "25/01/1988", "26/01/1988", "27/01/1988", "28/01/1988", 
"29/01/1988", "5/1/1988", "6/1/1988", "7/1/1988", "8/1/1988", 
"9/1/1988"), class = "factor"), firms = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
return = c(5L, 6L, 4L, 5L, 6L, 6L, 13L, 3L, 2L, 5L, 2L, 7L, 
3L, 5L, 7L, 5L, 9L, 1L, 5L, 2L, 7L, 2L, 5L, 7L, 5L, 9L, 1L, 
5L, 2L, 7L, 2L, 5L, 6L, 8L, 5L, 4L, 3L, 18L, 5L, 2L, 7L, 
3L, 9L, 2L)), class = "data.frame", row.names = c(NA, -44L
))

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76605

Here is a tidyverse solution.

library(tidyverse)
library(zoo)

df1 %>%
  mutate(date = dmy(date),
         month = as.yearmon(date)) %>%
  group_by(firms, month) %>%
  summarise(i = which(return == max(return)),
            DMAX = last(date) - date[last(i)]) %>%
  select(month, firms, DMAX)
## A tibble: 2 x 3
## Groups:   firms [2]
#  month         firms DMAX     
#  <S3: yearmon> <chr> <time>   
#1 Jan 1988      A     17 days  
#2 Jan 1988      B     " 7 days"

Upvotes: 0

Related Questions