boshek
boshek

Reputation: 4406

Continuing a sequence into NAs using dplyr

I am trying to figure out a dplyr specific way of continuing a sequence of numbers when there are NAs in that column.

For example I have this dataframe:

library(tibble)

dat <- tribble(
  ~x, ~group,
  1, "A",
  2, "A",
  NA_real_, "A",
  NA_real_, "A",
  1, "B",
  NA_real_, "B",
  3, "B"
)

dat
#> # A tibble: 7 × 2
#>       x group
#>   <dbl> <chr>
#> 1     1 A    
#> 2     2 A    
#> 3    NA A    
#> 4    NA A    
#> 5     1 B    
#> 6    NA B    
#> 7     3 B

I would like this one:

#> # A tibble: 7 × 2
#>       x group
#>   <dbl> <chr>
#> 1     1 A    
#> 2     2 A    
#> 3     3 A    
#> 4     4 A    
#> 5     1 B    
#> 6     2 B    
#> 7     3 B

When I try this I get a warning which makes me think I am probably approaching this incorrectly:

library(dplyr)

dat %>%
  group_by(group) %>%
  mutate(n = n()) %>%
  mutate(new_seq = seq_len(n))
#> Warning in seq_len(n): first element used of 'length.out' argument

#> Warning in seq_len(n): first element used of 'length.out' argument
#> # A tibble: 7 × 4
#> # Groups:   group [2]
#>       x group     n new_seq
#>   <dbl> <chr> <int>   <int>
#> 1     1 A         4       1
#> 2     2 A         4       2
#> 3    NA A         4       3
#> 4    NA A         4       4
#> 5     1 B         3       1
#> 6    NA B         3       2
#> 7     3 B         3       3

Upvotes: 1

Views: 39

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101307

A base R option using ave (work in a similar way as group_by in dplyr)

> transform(dat, x = ave(x, group, FUN = seq_along))
  x group
1 1     A
2 2     A
3 3     A
4 4     A
5 1     B
6 2     B
7 3     B

Upvotes: 1

akrun
akrun

Reputation: 887078

We could use rowid directly if the intention is to create a sequence and group size is just intermediate column

library(data.table)
library(dplyr)
dat %>% 
   mutate(new_seq = rowid(group))

The issue with using a column after it is created is that it is no longer a single row as showed in @Maëls post. If we need to do that, use first as seq_len is not vectorized and here it is not needed as well

dat %>%
  group_by(group) %>%
  mutate(n = n()) %>%
  mutate(new_seq = seq_len(first(n)))

Upvotes: 1

Ma&#235;l
Ma&#235;l

Reputation: 51974

It's easier if you do it in one go. Your approach is not 'wrong', it is just that seq_len needs one integer, and you are giving a vector (n), so seq_len corrects it by using the first value.

dat %>% 
  group_by(group) %>% 
  mutate(x = seq_len(n()))

Note that row_number might be even easier here:

dat %>% 
  group_by(group) %>% 
  mutate(x = row_number())

Upvotes: 2

Related Questions