How to assign sequential identities to binary states with dplyr?

Question

I am trying to analyze some information about an object as it moves between three possible states. The data is laid out such that each state has its own column, with binary values in sequential order, e.g.,

df <- data.frame(state1 = c(0,0,0,1,1,0,0,1,0,0,0), state2 = c(1,1,1,0,0,1,1,0,1,0,0), state3 = c(0,0,0,0,0,0,0,0,0,1,1))

print(df)

   state1 state2 state3
1       0      1      0
2       0      1      0
3       0      1      0
4       1      0      0
5       1      0      0
6       0      1      0
7       0      1      0
8       1      0      0
9       0      1      0
10      0      0      1
11      0      0      1

I would like to add a fourth column indicating the state being occupied, but while preserving sequence, e.g.,

df2 <- data.frame(state1 = c(0,0,0,1,1,0,0,1,0,0,0), state2 = c(1,1,1,0,0,1,1,0,1,0,0), state3 = c(0,0,0,0,0,0,0,0,0,1,1), state.id = c(2.1, 2.1, 2.1, 1.1, 1.1, 2.2, 2.2, 1.2, 2.3, 3.1, 3.1))

print(df2)

   state1 state2 state3 state.id
1       0      1      0      2.1
2       0      1      0      2.1
3       0      1      0      2.1
4       1      0      0      1.1
5       1      0      0      1.1
6       0      1      0      2.2
7       0      1      0      2.2
8       1      0      0      1.2
9       0      1      0      2.3
10      0      0      1      3.1
11      0      0      1      3.1

How could I go about doing this (preferably via the dplyr package)? Thanks in advance.

Ronak Shah · Accepted Answer

We can use max.col to get maximum number in each row (temp). We also create a new column with row_number(), create a sequential counter within each temp which increments if the difference in row number is greater than 1.

library(dplyr)

df %>%
   mutate(temp = max.col(.), 
          row = row_number()) %>%
   group_by(temp) %>%
   mutate(temp1 = cumsum(row - lag(row, default = first(row)) > 1) + 1,
          state.id = paste(temp, temp1, sep = ".")) %>%
   ungroup %>%
   select(-temp, -temp1, -row)

# A tibble: 11 x 4
#   state1 state2 state3 state.id
#            
# 1      0      1      0 2.1     
# 2      0      1      0 2.1     
# 3      0      1      0 2.1     
# 4      1      0      0 1.1     
# 5      1      0      0 1.1     
# 6      0      1      0 2.2     
# 7      0      1      0 2.2     
# 8      1      0      0 1.2     
# 9      0      1      0 2.3     
#10      0      0      1 3.1     
#11      0      0      1 3.1

How to assign sequential identities to binary states with dplyr?

Answers (2)

Related Questions