jimken
jimken

Reputation: 117

Create column based upon date

I have data similar to the following, but larger

boat = c(1,1,1,1,1,1,1,2,2,2,2,2,2)
species = c("cod", "haddock", "ling", 
       "cod", "haddock", "ling", "tusk", 
       "cod", "haddock", "ling",
       "cod", "haddock", "ling")
date = c(as.Date(c("1.03.2017","1.03.2017","1.03.2017",
               "2.03.2017", "2.03.2017", "2.03.2017","2.03.2017",
               "4.03.2017","4.03.2017","4.03.2017",
               "7.03.2017", "7.03.2017", "7.03.2017"), "%d.%m.%Y"))
df <- data.frame(boat, species, date)

    df
    boat  species  date
    1     cod      01.03.2017
    1     haddock  01.03.2017
    1     ling     01.03.2017
    1     cod      02.03.2017
    1     haddock  02.03.2017
    1     ling     02.03.2017
    1     tusk     02.03.2017
    2     cod      04.03.2017
    2     haddock  04.03.2017
    2     ling     04.03.2017
    2     cod      07.03.2017
    2     haddock  07.03.2017
    2     ling     07.03.2017

I want to create an additional column ranking the dates by boat in order so my dataset will look like this.

    boat  species  date       rank
    1     cod      01.03.2017 1
    1     haddock  01.03.2017 1
    1     ling     01.03.2017 1
    1     cod      02.03.2017 2
    1     haddock  02.03.2017 2
    1     ling     02.03.2017 2
    1     tusk     02.03.2017 2
    2     cod      04.03.2017 1
    2     haddock  04.03.2017 1
    2     ling     04.03.2017 1
    2     cod      07.03.2017 2
    2     haddock  07.03.2017 2
    2     ling     07.03.2017 2

I have tried the following code

library(dplyr)

df %>% 
group_by(boat, species) %>% 
mutate(Order = rank(date))

but species which did not appear previously are given rank "1" the first time they appear. Any help is appreciated.

Upvotes: 3

Views: 74

Answers (4)

lmo
lmo

Reputation: 38500

In base R, you could use ave for the group-level calculations and perform those calculations using cumsum, diff, and sign on an integer constructed from the date variable.

df$rank <- ave(as.integer(df$date),
               df$boat, FUN=function(x) cumsum(c(1, sign(diff(x)))))

this returns

df
   boat species       date rank
1     1     cod 2017-03-01    1
2     1 haddock 2017-03-01    1
3     1    ling 2017-03-01    1
4     1     cod 2017-03-02    2
5     1 haddock 2017-03-02    2
6     1    ling 2017-03-02    2
7     1    tusk 2017-03-02    2
8     2     cod 2017-03-04    1
9     2 haddock 2017-03-04    1
10    2    ling 2017-03-04    1
11    2     cod 2017-03-07    2
12    2 haddock 2017-03-07    2
13    2    ling 2017-03-07    2

As an excuse to use the newly introduced (R 3.3.0) grouping function, you could also do

df$rank2 <- ave(as.integer(df$date), df$boat,
                FUN=function(x) {tmp <- attr(grouping(x), "ends");
                                 rep(seq_along(tmp), c(tmp[1], diff(tmp)))})

Upvotes: 1

M--
M--

Reputation: 28850

library(dplyr)

left_join(df, 
             unique(df[,c(1,3)]) %>% 
                                 group_by(boat) %>% 
                                 mutate(Order = rank(date)))


##    boat species       date Order
## 1     1     cod 2017-03-01     1
## 2     1 haddock 2017-03-01     1
## 3     1    ling 2017-03-01     1
## 4     1     cod 2017-03-02     2
## 5     1 haddock 2017-03-02     2
## 6     1    ling 2017-03-02     2
## 7     1    tusk 2017-03-02     2
## 8     2     cod 2017-03-04     1
## 9     2 haddock 2017-03-04     1
## 10    2    ling 2017-03-04     1
## 11    2     cod 2017-03-07     2
## 12    2 haddock 2017-03-07     2
## 13    2    ling 2017-03-07     2

Upvotes: 1

www
www

Reputation: 39154

We can use group_by and dense_rank from dplyr to create the desired output.

library(dplyr)

df2 <- df %>%
  group_by(boat) %>%
  mutate(rank = dense_rank(date))

df2
# A tibble: 13 x 4
# Groups:   boat [2]
    boat species       date  rank
   <dbl>  <fctr>     <date> <int>
 1     1     cod 2017-03-01     1
 2     1 haddock 2017-03-01     1
 3     1    ling 2017-03-01     1
 4     1     cod 2017-03-02     2
 5     1 haddock 2017-03-02     2
 6     1    ling 2017-03-02     2
 7     1    tusk 2017-03-02     2
 8     2     cod 2017-03-04     1
 9     2 haddock 2017-03-04     1
10     2    ling 2017-03-04     1
11     2     cod 2017-03-07     2
12     2 haddock 2017-03-07     2
13     2    ling 2017-03-07     2

Upvotes: 5

CPak
CPak

Reputation: 13581

Solution

 df %>% 
    group_by(boat) %>% 
    mutate(Order=cumsum(lag(date,default=head(date,1)) != date)+1)

Output

    boat species       date Order
 1     1     cod 2017-03-01     1
 2     1 haddock 2017-03-01     1
 3     1    ling 2017-03-01     1
 4     1     cod 2017-03-02     2
 5     1 haddock 2017-03-02     2
 6     1    ling 2017-03-02     2
 7     1    tusk 2017-03-02     2
 8     2     cod 2017-03-04     1
 9     2 haddock 2017-03-04     1
10     2    ling 2017-03-04     1
11     2     cod 2017-03-07     2
12     2 haddock 2017-03-07     2
13     2    ling 2017-03-07     2

Upvotes: 1

Related Questions