Florian
Florian

Reputation: 1258

Different results with group_by() and n() (paste0)

I want to ass a counting tag to an ID. Therefore i group my data and if a group has more than one row a counting tag is added. The indicator for the counting is done by dplyr::n(). If the n() is used outside of paste0() my code works, if its used inside, the code does not work. What is the reason for the different results?

n() outside paste0() --> right result

tibble::tibble(Group = c("A","A","B","C"),
               ID = c("ID_1", "ID_2", "ID_10", "ID_20")) %>% 
  dplyr::group_by(Group) %>% 
  dplyr::mutate(n = dplyr::n(),
                tag = ifelse(n > 1, 
                             paste0(ID, " #", dplyr::row_number()),
                             ID)) %>% 
  dplyr::ungroup()

A tibble: 4 x 4
  Group ID        n tag    
  <chr> <chr> <int> <chr>  
1 A     ID_1      2 ID_1 #1
2 A     ID_2      2 ID_2 #2
3 B     ID_10     1 ID_10  
4 C     ID_20     1 ID_20 

n() inside paste0() --> wrong result (tags are both ID_1 #1)

tibble::tibble(Group = c("A","A","B","C"),
               ID = c("ID_1", "ID_2", "ID_10", "ID_20")) %>% 
  dplyr::group_by(Group) %>% 
  dplyr::mutate(tag = ifelse(dplyr::n() > 1, 
                             paste0(ID, " #", dplyr::row_number()),
                             ID)) %>% 
  dplyr::ungroup()

A tibble: 4 x 3
  Group ID    tag    
  <chr> <chr> <chr>  
1 A     ID_1  ID_1 #1
2 A     ID_2  ID_1 #1
3 B     ID_10 ID_10  
4 C     ID_20 ID_20

Upvotes: 0

Views: 73

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389047

Because the condition n() > 1 is of length 1 and ifelse returns a vector which is of same length as the condition we are checking. You can try if/else here :

tibble::tibble(Group = c("A","A","B","C"),
           ID = c("ID_1", "ID_2", "ID_10", "ID_20")) %>% 
    dplyr::group_by(Group) %>% 
    dplyr::mutate(tag = if(n() > 1) paste0(ID, " #", dplyr::row_number()) 
                  else ID) %>% 
    dplyr::ungroup()

# A tibble: 4 x 3
#  Group ID    tag    
#  <chr> <chr> <chr>  
#1 A     ID_1  ID_1 #1
#2 A     ID_2  ID_2 #2
#3 B     ID_10 ID_10  
#4 C     ID_20 ID_20  

In your first attempt the condition is n > 1 for first group (Group == A) which has length 2 whereas in second case the condition is n() > 1 and has length of only 1 hence only 1 value is generated (ID_1 #1) and recycled for other row.

Upvotes: 1

Related Questions