Reputation: 43
I am trying to analyze some information about an object as it moves between three possible states. The data is laid out such that each state has its own column, with binary values in sequential order, e.g.,
df <- data.frame(state1 = c(0,0,0,1,1,0,0,1,0,0,0), state2 = c(1,1,1,0,0,1,1,0,1,0,0), state3 = c(0,0,0,0,0,0,0,0,0,1,1))
print(df)
state1 state2 state3
1 0 1 0
2 0 1 0
3 0 1 0
4 1 0 0
5 1 0 0
6 0 1 0
7 0 1 0
8 1 0 0
9 0 1 0
10 0 0 1
11 0 0 1
I would like to add a fourth column indicating the state being occupied, but while preserving sequence, e.g.,
df2 <- data.frame(state1 = c(0,0,0,1,1,0,0,1,0,0,0), state2 = c(1,1,1,0,0,1,1,0,1,0,0), state3 = c(0,0,0,0,0,0,0,0,0,1,1), state.id = c(2.1, 2.1, 2.1, 1.1, 1.1, 2.2, 2.2, 1.2, 2.3, 3.1, 3.1))
print(df2)
state1 state2 state3 state.id
1 0 1 0 2.1
2 0 1 0 2.1
3 0 1 0 2.1
4 1 0 0 1.1
5 1 0 0 1.1
6 0 1 0 2.2
7 0 1 0 2.2
8 1 0 0 1.2
9 0 1 0 2.3
10 0 0 1 3.1
11 0 0 1 3.1
How could I go about doing this (preferably via the dplyr package)? Thanks in advance.
Upvotes: 4
Views: 64
Reputation: 389135
We can use max.col
to get maximum number in each row (temp
). We also create a new column with row_number()
, create a sequential counter within each temp
which increments if the difference in row number is greater than 1.
library(dplyr)
df %>%
mutate(temp = max.col(.),
row = row_number()) %>%
group_by(temp) %>%
mutate(temp1 = cumsum(row - lag(row, default = first(row)) > 1) + 1,
state.id = paste(temp, temp1, sep = ".")) %>%
ungroup %>%
select(-temp, -temp1, -row)
# A tibble: 11 x 4
# state1 state2 state3 state.id
# <dbl> <dbl> <dbl> <chr>
# 1 0 1 0 2.1
# 2 0 1 0 2.1
# 3 0 1 0 2.1
# 4 1 0 0 1.1
# 5 1 0 0 1.1
# 6 0 1 0 2.2
# 7 0 1 0 2.2
# 8 1 0 0 1.2
# 9 0 1 0 2.3
#10 0 0 1 3.1
#11 0 0 1 3.1
Upvotes: 3
Reputation: 15072
This is a little bit complicated because you want to do two things and combine them:
I think the way to do this in the end is to reshape the data, and use two groupings to get what you want:
which
to figure out the state number that is activecumsum
to get an index that goes up every time a state becomes active.filter
to keep only the rows that correspond to the relevant column for each rowid
, and right_join
back onto the original data.The second task in particular I think is hard to achieve without reshaping or dealing with an unwieldy number of columns.
library(tidyverse)
df <- data.frame(state1 = c(0,0,0,1,1,0,0,1,0,0,0), state2 = c(1,1,1,0,0,1,1,0,1,0,0), state3 = c(0,0,0,0,0,0,0,0,0,1,1))
df <- rowid_to_column(df)
df %>%
gather(state, binary, -rowid) %>%
arrange(state, rowid) %>%
group_by(rowid) %>%
mutate(state_num = which(binary == 1)) %>%
group_by(state) %>%
mutate(
state_col_num = str_remove(state, "state") %>% as.integer(),
sequence = cumsum(binary == 1 & lag(binary, default = 0) == 0)
) %>%
ungroup() %>%
filter(state_num == state_col_num) %>%
unite("state.id", state_num, sequence, sep = ".") %>%
right_join(df, by = "rowid") %>%
select(rowid, matches("state(\\d+|\\.id)"))
#> # A tibble: 11 x 5
#> rowid state.id state1 state2 state3
#> <int> <chr> <dbl> <dbl> <dbl>
#> 1 1 2.1 0 1 0
#> 2 2 2.1 0 1 0
#> 3 3 2.1 0 1 0
#> 4 4 1.1 1 0 0
#> 5 5 1.1 1 0 0
#> 6 6 2.2 0 1 0
#> 7 7 2.2 0 1 0
#> 8 8 1.2 1 0 0
#> 9 9 2.3 0 1 0
#> 10 10 3.1 0 0 1
#> 11 11 3.1 0 0 1
Created on 2019-08-07 by the reprex package (v0.3.0)
Upvotes: 2