MPetr
MPetr

Reputation: 13

Create distinctive groups for one variable

I am trying to define distinct groups based on one variable. This is a simple question but I couldn't figure it out.

In my dataset I have for each tree a number of unique groups (with value 1 in "dist" variable). I would like to create a new variable, which will assign to each group unique distinctive value.

My data looks like:

Tree_ID dist 
1       0    
1       1    
1       1    
1       0    
1       1    
1       0    

I would like to create a new variable which will assign to distinct groups "dist == 1" unique value (unique_gr).

Tree_ID dist unique_gr 
1       0    0
1       1    1
1       1    1
1       0    0
1       1    2
1       0    0

I have tried to use "ifelse" function to check for the current row, when "dist == 0" means no group

 ifelse(dist == 1, "unique_gr", 0) # checking the current row

The main issue is how I can specify unique values in "unique_gr" that are different/increasing (e.g. 1,2,3,4..) for each distinct group?

Thank you for your help.

Upvotes: 1

Views: 75

Answers (2)

akrun
akrun

Reputation: 887118

Here is another option using data.table

library(data.table)
setDT(df1)[, unique_gr := rleid(dist)*dist, Tree_ID][unique_gr != 0,
                     unique_gr := match(unique_gr, unique(unique_gr))]
#   Tree_ID dist unique_gr
#1:       1    0         0
#2:       1    1         1
#3:       1    1         1
#4:       1    0         0
#5:       1    1         2
#6:       1    0         0

Upvotes: 1

www
www

Reputation: 39154

A solution from tidyverse and data.table. The key is to use rleid function.

# Create example data frame
dt <- read.table(text = "Tree_ID dist 
1       0    
                 1       1    
                 1       1    
                 1       0    
                 1       1    
                 1       0   ",
                 header = TRUE, stringsAsFactors = FALSE)


library(tidyverse)
library(data.table)

dt2 <- dt %>%
  mutate(unique_gr = rleid(dist)) %>%
  mutate(unique_gr = ifelse(dist != 0 & first(dist) == 0, unique_gr/2,
                            ifelse(dist != 0 & first(dist) != 0, (unique_gr + 1)/2, 0)))
dt2
  Tree_ID dist unique_gr
1       1    0         0
2       1    1         1
3       1    1         1
4       1    0         0
5       1    1         2
6       1    0         0

Notice that this solution will also work if the beginning of dist is not 0, as the following example shows.

# Create example data frame with the beginning of dist is not 0
dt_1 <- read.table(text = "Tree_ID dist 
1       1    
                 1       1    
                 1       1    
                 1       0    
                 1       1    
                 1       0   ",
                 header = TRUE, stringsAsFactors = FALSE)


dt2_1 <- dt_1 %>%
  mutate(unique_gr = rleid(dist)) %>%
  mutate(unique_gr = ifelse(dist != 0 & first(dist) == 0, unique_gr/2,
                            ifelse(dist != 0 & first(dist) != 0, (unique_gr + 1)/2, 0)))
dt2_1
  Tree_ID dist unique_gr
1       1    1         1
2       1    1         1
3       1    1         1
4       1    0         0
5       1    1         2
6       1    0         0

Upvotes: 2

Related Questions