Reputation: 542
I have this data.table
library(data.table)
data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)
I would like to have this output :
data.table(
id = c(1, 2),
m1 = c("A", "B"),
m2 = c("A", "B"),
m3 = c("A", "B"),
m4 = c("B", "B"),
m5 = c("B", "B"),
m6 = c("B", "B"),
m7 = c("B", "B"),
m8 = c("A", "B"),
m9 = c("A", "B"),
m10 = c("A", "B"),
m11 = c("A", "A"),
m12 = c("A", "A")
)
Those who used to do sequence analysis may have recognized that I'm trying to do what seqformat
do in the TRaMiNeR
package would do, but with higher performance due to use of data.table
Upvotes: 2
Views: 106
Reputation: 83215
An alternative solution:
dt[, unlist(Map(`:`, begin, end)), by = .(id, state)
][, dcast(.SD, id ~ sprintf("m%02d", V1), value.var = "state")]
which gives:
id m01 m02 m03 m04 m05 m06 m07 m08 m09 m10 m11 m12 1: 1 A A A B B B B A A A A A 2: 2 B B B B B B B B B B A A
It is possibly better to keep the data in long format. Long format is often easier to work with in R later on in your data processing/analysis.
You could achieve that with just:
dt[, unlist(Map(`:`, begin, end)), by = .(id, state)][order(id, V1)]
which gives:
id state V1 1: 1 A 1 2: 1 A 2 3: 1 A 3 4: 1 B 4 5: 1 B 5 6: 1 B 6 7: 1 B 7 8: 1 A 8 9: 1 A 9 10: 1 A 10 11: 1 A 11 12: 1 A 12 13: 2 B 1 14: 2 B 2 15: 2 B 3 16: 2 B 4 17: 2 B 5 18: 2 B 6 19: 2 B 7 20: 2 B 8 21: 2 B 9 22: 2 B 10 23: 2 A 11 24: 2 A 12
(where the [order(id, V1)]
-part isn't necessary)
Used data:
dt <- data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)
Upvotes: 1
Reputation: 887118
One option with data.table
would be to melt
the dataset after creating a sequence column, then grouped by 'i1', 'id', 'state', get the seq
uence of first
and last
'value', dcast
it from 'long' to 'wide'
dt1 <- melt(dt[, i1 := seq_len(.N)], id.vars = c("i1", "id", "state"))[,
paste0("m", seq(first(value), last(value))), .(i1, id, state)]
dcast(dt1, id ~ V1, value.var = "state")[]
# id m1 m10 m11 m12 m2 m3 m4 m5 m6 m7 m8 m9
#1: 1 A A A A A A B B B B A A
#2: 2 B B A A B B B B B B B B
Upvotes: 5
Reputation: 39154
A solution using the tidyverse
.
library(tidyverse)
library(data.table)
dat <- data.table(
id = c(rep(1, 3), rep(2, 2)),
begin = c(1, 4, 8, 1, 11),
end = c(3, 7, 12, 10, 12),
state = c("A", "B", "A", "B", "A")
)
dat2 <- dat %>%
mutate(Index = map2(begin, end, `:`)) %>%
unnest() %>%
mutate(Index = str_c("m", Index)) %>%
select(id, state, Index) %>%
spread(Index, state) %>%
select(id, str_c("m", 1:(ncol(.) - 1)))
dat2
# id m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
# 1 1 A A A B B B B A A A A A
# 2 2 B B B B B B B B B B A A
Upvotes: 2