Reputation: 117
I have data similar to the following, but larger
boat = c(1,1,1,1,1,1,1,2,2,2,2,2,2)
species = c("cod", "haddock", "ling",
"cod", "haddock", "ling", "tusk",
"cod", "haddock", "ling",
"cod", "haddock", "ling")
date = c(as.Date(c("1.03.2017","1.03.2017","1.03.2017",
"2.03.2017", "2.03.2017", "2.03.2017","2.03.2017",
"4.03.2017","4.03.2017","4.03.2017",
"7.03.2017", "7.03.2017", "7.03.2017"), "%d.%m.%Y"))
df <- data.frame(boat, species, date)
df
boat species date
1 cod 01.03.2017
1 haddock 01.03.2017
1 ling 01.03.2017
1 cod 02.03.2017
1 haddock 02.03.2017
1 ling 02.03.2017
1 tusk 02.03.2017
2 cod 04.03.2017
2 haddock 04.03.2017
2 ling 04.03.2017
2 cod 07.03.2017
2 haddock 07.03.2017
2 ling 07.03.2017
I want to create an additional column ranking the dates by boat in order so my dataset will look like this.
boat species date rank
1 cod 01.03.2017 1
1 haddock 01.03.2017 1
1 ling 01.03.2017 1
1 cod 02.03.2017 2
1 haddock 02.03.2017 2
1 ling 02.03.2017 2
1 tusk 02.03.2017 2
2 cod 04.03.2017 1
2 haddock 04.03.2017 1
2 ling 04.03.2017 1
2 cod 07.03.2017 2
2 haddock 07.03.2017 2
2 ling 07.03.2017 2
I have tried the following code
library(dplyr)
df %>%
group_by(boat, species) %>%
mutate(Order = rank(date))
but species which did not appear previously are given rank "1" the first time they appear. Any help is appreciated.
Upvotes: 3
Views: 74
Reputation: 38500
In base R, you could use ave
for the group-level calculations and perform those calculations using cumsum
, diff
, and sign
on an integer constructed from the date variable.
df$rank <- ave(as.integer(df$date),
df$boat, FUN=function(x) cumsum(c(1, sign(diff(x)))))
this returns
df
boat species date rank
1 1 cod 2017-03-01 1
2 1 haddock 2017-03-01 1
3 1 ling 2017-03-01 1
4 1 cod 2017-03-02 2
5 1 haddock 2017-03-02 2
6 1 ling 2017-03-02 2
7 1 tusk 2017-03-02 2
8 2 cod 2017-03-04 1
9 2 haddock 2017-03-04 1
10 2 ling 2017-03-04 1
11 2 cod 2017-03-07 2
12 2 haddock 2017-03-07 2
13 2 ling 2017-03-07 2
As an excuse to use the newly introduced (R 3.3.0) grouping
function, you could also do
df$rank2 <- ave(as.integer(df$date), df$boat,
FUN=function(x) {tmp <- attr(grouping(x), "ends");
rep(seq_along(tmp), c(tmp[1], diff(tmp)))})
Upvotes: 1
Reputation: 28850
library(dplyr)
left_join(df,
unique(df[,c(1,3)]) %>%
group_by(boat) %>%
mutate(Order = rank(date)))
## boat species date Order
## 1 1 cod 2017-03-01 1
## 2 1 haddock 2017-03-01 1
## 3 1 ling 2017-03-01 1
## 4 1 cod 2017-03-02 2
## 5 1 haddock 2017-03-02 2
## 6 1 ling 2017-03-02 2
## 7 1 tusk 2017-03-02 2
## 8 2 cod 2017-03-04 1
## 9 2 haddock 2017-03-04 1
## 10 2 ling 2017-03-04 1
## 11 2 cod 2017-03-07 2
## 12 2 haddock 2017-03-07 2
## 13 2 ling 2017-03-07 2
Upvotes: 1
Reputation: 39154
We can use group_by
and dense_rank
from dplyr
to create the desired output.
library(dplyr)
df2 <- df %>%
group_by(boat) %>%
mutate(rank = dense_rank(date))
df2
# A tibble: 13 x 4
# Groups: boat [2]
boat species date rank
<dbl> <fctr> <date> <int>
1 1 cod 2017-03-01 1
2 1 haddock 2017-03-01 1
3 1 ling 2017-03-01 1
4 1 cod 2017-03-02 2
5 1 haddock 2017-03-02 2
6 1 ling 2017-03-02 2
7 1 tusk 2017-03-02 2
8 2 cod 2017-03-04 1
9 2 haddock 2017-03-04 1
10 2 ling 2017-03-04 1
11 2 cod 2017-03-07 2
12 2 haddock 2017-03-07 2
13 2 ling 2017-03-07 2
Upvotes: 5
Reputation: 13581
df %>%
group_by(boat) %>%
mutate(Order=cumsum(lag(date,default=head(date,1)) != date)+1)
boat species date Order
1 1 cod 2017-03-01 1
2 1 haddock 2017-03-01 1
3 1 ling 2017-03-01 1
4 1 cod 2017-03-02 2
5 1 haddock 2017-03-02 2
6 1 ling 2017-03-02 2
7 1 tusk 2017-03-02 2
8 2 cod 2017-03-04 1
9 2 haddock 2017-03-04 1
10 2 ling 2017-03-04 1
11 2 cod 2017-03-07 2
12 2 haddock 2017-03-07 2
13 2 ling 2017-03-07 2
Upvotes: 1