Reputation: 13
I am trying to define distinct groups based on one variable. This is a simple question but I couldn't figure it out.
In my dataset I have for each tree a number of unique groups (with value 1 in "dist" variable). I would like to create a new variable, which will assign to each group unique distinctive value.
My data looks like:
Tree_ID dist
1 0
1 1
1 1
1 0
1 1
1 0
I would like to create a new variable which will assign to distinct groups "dist == 1" unique value (unique_gr).
Tree_ID dist unique_gr
1 0 0
1 1 1
1 1 1
1 0 0
1 1 2
1 0 0
I have tried to use "ifelse" function to check for the current row, when "dist == 0" means no group
ifelse(dist == 1, "unique_gr", 0) # checking the current row
The main issue is how I can specify unique values in "unique_gr" that are different/increasing (e.g. 1,2,3,4..) for each distinct group?
Thank you for your help.
Upvotes: 1
Views: 75
Reputation: 887118
Here is another option using data.table
library(data.table)
setDT(df1)[, unique_gr := rleid(dist)*dist, Tree_ID][unique_gr != 0,
unique_gr := match(unique_gr, unique(unique_gr))]
# Tree_ID dist unique_gr
#1: 1 0 0
#2: 1 1 1
#3: 1 1 1
#4: 1 0 0
#5: 1 1 2
#6: 1 0 0
Upvotes: 1
Reputation: 39154
A solution from tidyverse
and data.table
. The key is to use rleid
function.
# Create example data frame
dt <- read.table(text = "Tree_ID dist
1 0
1 1
1 1
1 0
1 1
1 0 ",
header = TRUE, stringsAsFactors = FALSE)
library(tidyverse)
library(data.table)
dt2 <- dt %>%
mutate(unique_gr = rleid(dist)) %>%
mutate(unique_gr = ifelse(dist != 0 & first(dist) == 0, unique_gr/2,
ifelse(dist != 0 & first(dist) != 0, (unique_gr + 1)/2, 0)))
dt2
Tree_ID dist unique_gr
1 1 0 0
2 1 1 1
3 1 1 1
4 1 0 0
5 1 1 2
6 1 0 0
Notice that this solution will also work if the beginning of dist
is not 0
, as the following example shows.
# Create example data frame with the beginning of dist is not 0
dt_1 <- read.table(text = "Tree_ID dist
1 1
1 1
1 1
1 0
1 1
1 0 ",
header = TRUE, stringsAsFactors = FALSE)
dt2_1 <- dt_1 %>%
mutate(unique_gr = rleid(dist)) %>%
mutate(unique_gr = ifelse(dist != 0 & first(dist) == 0, unique_gr/2,
ifelse(dist != 0 & first(dist) != 0, (unique_gr + 1)/2, 0)))
dt2_1
Tree_ID dist unique_gr
1 1 1 1
2 1 1 1
3 1 1 1
4 1 0 0
5 1 1 2
6 1 0 0
Upvotes: 2