elliot
elliot

Reputation: 1944

How to create unique identifier for non-repetitive rows?

I'm trying to create a flag variable for a column that should change for each occurrence of the position column. For example, here is a data_frame:

df <- data_frame(
  variable = c('Position',
               'Department',
               'Location',
               'Position',
               'Department',
               'Location',
               'Location'
               )
)

df
    # A tibble: 7 x 1
      variable  
      <chr>     
    1 Position  
    2 Department
    3 Location  
    4 Position  
    5 Department
    6 Location  
    7 Location 

How can I create something similar to this ID variable? I can split on this variable now and merge cells as I need to.

# A tibble: 7 x 2
  variable   id   
  <chr>      <chr>
1 Position   A    
2 Department A    
3 Location   A    
4 Position   B    
5 Department B    
6 Location   B    
7 Location   B  

Even better would be a way to merge any cells that have duplicates in the variable column.

Upvotes: 2

Views: 39

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76412

A base R approach would be with duplicated. I will borrow @akrun's idea of subsetting the built-in vector LETTERS.

LETTERS[duplicated(df$variable) + 1L]
#[1] "A" "A" "A" "B" "B" "B" "B"

So all you have to do is assign this result to the new column.

df$id <- LETTERS[duplicated(df$variable) + 1L]

Upvotes: 2

akrun
akrun

Reputation: 887128

We create a logical vector based on the occurrence of 'Position' element in 'variable', get the cumulative sum (cumsum) and use that numeric index for changing to LETTERS

library(dplyr)
df %>% 
   mutate(id = LETTERS[cumsum(variable== 'Position')])
# A tibble: 7 x 2
#  variable   id   
#  <chr>      <chr>
#1 Position   A    
#2 Department A    
#3 Location   A    
#4 Position   B    
#5 Department B    
#6 Location   B    
#7 Location   B    

Upvotes: 3

Related Questions