Reputation: 117
I have a panel data frame like this
date firms return
5/1/1988 A 5
6/1/1988 A 6
7/1/1988 A 4
8/1/1988 A 5
9/1/1988 A 6
11/1/1988 A 6
12/1/1988 A 13
13/01/1988 A 3
14/01/1988 A 2
15/01/1988 A 5
16/01/1988 A 2
18/01/1988 A 7
19/01/1988 A 3
20/01/1988 A 5
21/01/1988 A 7
22/01/1988 A 5
23/01/1988 A 9
25/01/1988 A 1
26/01/1988 A 5
27/01/1988 A 2
28/01/1988 A 7
29/01/1988 A 2
5/1/1988 B 5
6/1/1988 B 7
7/1/1988 B 5
8/1/1988 B 9
9/1/1988 B 1
11/1/1988 B 5
12/1/1988 B 2
13/01/1988 B 7
14/01/1988 B 2
15/01/1988 B 5
16/01/1988 B 6
18/01/1988 B 8
19/01/1988 B 5
20/01/1988 B 4
21/01/1988 B 3
22/01/1988 B 18
23/01/1988 B 5
25/01/1988 B 2
26/01/1988 B 7
27/01/1988 B 3
28/01/1988 B 9
29/01/1988 B 2
Now from the above panel data, I want to find a variable called DMAX. DMAX means the unit of days as the difference between the Maximum return day and the last trading day of the same month. For example, in January 1988 the Maximum return appears on 12 Jan 1988 for firm A. Hence the DMAX is the number of days between 12 Jan 1988 to the end of that month which is 15 days. For firm B, the maximum value appears on 22 Jan 1988. So the remaining number of days of that month is 6 days. Therefore the expected outcome is
date Firms DMAX(days)
Jan-88 A 15
Jan-88 B 6
I would be grateful if you can help me in this regard.
Upvotes: 3
Views: 62
Reputation: 269905
1) Base R For each year/month and firm aggregate the difference between the number of rows and the position of the maximum return row. No packages are used.
with(transform(DF, date = as.Date(date, "%d/%m/%Y")),
aggregate(list(DMAX = return),
data.frame(date = format(date, "%Y-%m"), firms),
function(x) length(x) - which.max(x)))
giving:
date firms DMAX
1 1988-01 A 15
2 1988-01 B 6
2) zoo Read DF
into a zoo object zd
with one column per firm and then aggregate that by year/month. Finally melt it to a long form data frame using fortify.zoo
. The fortify.zoo
line can be omitted if a zoo time series object is ok as the result.
library(zoo)
zd <- read.zoo(DF, index = "date", format = "%d/%m/%Y", split = "firms")
ag <- aggregate(zd, as.yearmon, function(x) length(na.omit(x)) - which.max(na.omit(x)))
fortify.zoo(ag, melt = TRUE)
giving:
Index Series Value
1 Jan 1988 A 15
2 Jan 1988 B 6
Note that ag
is a monthly zoo series of the form:
> ag
A B
Jan 1988 15 6
3) data.table
library(data.table)
DT <- as.data.table(DF)
DT[, list(DMAX = .N - which.max(return)),
by = list(date = format(as.Date(date, "%d/%m/%Y"), "%Y-%m"), firms)]
giving:
date firms DMAX
1: 1988-01 A 15
2: 1988-01 B 6
Lines <- "
date firms return
5/1/1988 A 5
6/1/1988 A 6
7/1/1988 A 4
8/1/1988 A 5
9/1/1988 A 6
11/1/1988 A 6
12/1/1988 A 13
13/01/1988 A 3
14/01/1988 A 2
15/01/1988 A 5
16/01/1988 A 2
18/01/1988 A 7
19/01/1988 A 3
20/01/1988 A 5
21/01/1988 A 7
22/01/1988 A 5
23/01/1988 A 9
25/01/1988 A 1
26/01/1988 A 5
27/01/1988 A 2
28/01/1988 A 7
29/01/1988 A 2
5/1/1988 B 5
6/1/1988 B 7
7/1/1988 B 5
8/1/1988 B 9
9/1/1988 B 1
11/1/1988 B 5
12/1/1988 B 2
13/01/1988 B 7
14/01/1988 B 2
15/01/1988 B 5
16/01/1988 B 6
18/01/1988 B 8
19/01/1988 B 5
20/01/1988 B 4
21/01/1988 B 3
22/01/1988 B 18
23/01/1988 B 5
25/01/1988 B 2
26/01/1988 B 7
27/01/1988 B 3
28/01/1988 B 9
29/01/1988 B 2
"
DF <- read.table(text = Lines, header = TRUE)
Upvotes: 0
Reputation: 23574
One way using the dplyr package would be the following. I called your data mydf. First, manipulate date
. Then, group the data by date
and firms
. Then, you look for the row with the largest value in return
and handle subtraction.
mutate(mydf, date = format(as.Date(date, format = "%d/%m/%Y"), "%m-%Y")) %>%
group_by(date, firms) %>%
summarize(DMAX = n() - which.max(return))
# A tibble: 2 x 3
# Groups: date [?]
# date firms DMAX
# <chr> <fct> <int>
#1 01-1988 A 15
#2 01-1988 B 6
DATA
mydf <-structure(list(date = structure(c(18L, 19L, 20L, 21L, 22L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L), .Label = c("11/1/1988",
"12/1/1988", "13/01/1988", "14/01/1988", "15/01/1988", "16/01/1988",
"18/01/1988", "19/01/1988", "20/01/1988", "21/01/1988", "22/01/1988",
"23/01/1988", "25/01/1988", "26/01/1988", "27/01/1988", "28/01/1988",
"29/01/1988", "5/1/1988", "6/1/1988", "7/1/1988", "8/1/1988",
"9/1/1988"), class = "factor"), firms = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
return = c(5L, 6L, 4L, 5L, 6L, 6L, 13L, 3L, 2L, 5L, 2L, 7L,
3L, 5L, 7L, 5L, 9L, 1L, 5L, 2L, 7L, 2L, 5L, 7L, 5L, 9L, 1L,
5L, 2L, 7L, 2L, 5L, 6L, 8L, 5L, 4L, 3L, 18L, 5L, 2L, 7L,
3L, 9L, 2L)), class = "data.frame", row.names = c(NA, -44L
))
Upvotes: 1
Reputation: 76605
Here is a tidyverse
solution.
library(tidyverse)
library(zoo)
df1 %>%
mutate(date = dmy(date),
month = as.yearmon(date)) %>%
group_by(firms, month) %>%
summarise(i = which(return == max(return)),
DMAX = last(date) - date[last(i)]) %>%
select(month, firms, DMAX)
## A tibble: 2 x 3
## Groups: firms [2]
# month firms DMAX
# <S3: yearmon> <chr> <time>
#1 Jan 1988 A 17 days
#2 Jan 1988 B " 7 days"
Upvotes: 0