Léo Joubert
Léo Joubert

Reputation: 542

extend data with data.table

I have this data.table

library(data.table)

data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)

I would like to have this output :

data.table(
  id = c(1, 2),
  m1 = c("A", "B"),
  m2 = c("A", "B"),
  m3 = c("A", "B"),
  m4 = c("B", "B"),
  m5 = c("B", "B"),
  m6 = c("B", "B"),
  m7 = c("B", "B"),
  m8 = c("A", "B"),
  m9 = c("A", "B"),
  m10 = c("A", "B"),
  m11 = c("A", "A"),
  m12 = c("A", "A")
)

Those who used to do sequence analysis may have recognized that I'm trying to do what seqformat do in the TRaMiNeR package would do, but with higher performance due to use of data.table

Upvotes: 2

Views: 106

Answers (3)

Jaap
Jaap

Reputation: 83215

An alternative solution:

dt[, unlist(Map(`:`, begin, end)), by = .(id, state)
   ][, dcast(.SD, id ~ sprintf("m%02d", V1), value.var = "state")]

which gives:

   id m01 m02 m03 m04 m05 m06 m07 m08 m09 m10 m11 m12
1:  1   A   A   A   B   B   B   B   A   A   A   A   A
2:  2   B   B   B   B   B   B   B   B   B   B   A   A

It is possibly better to keep the data in long format. Long format is often easier to work with in R later on in your data processing/analysis.

You could achieve that with just:

dt[, unlist(Map(`:`, begin, end)), by = .(id, state)][order(id, V1)]

which gives:

    id state V1
 1:  1     A  1
 2:  1     A  2
 3:  1     A  3
 4:  1     B  4
 5:  1     B  5
 6:  1     B  6
 7:  1     B  7
 8:  1     A  8
 9:  1     A  9
10:  1     A 10
11:  1     A 11
12:  1     A 12
13:  2     B  1
14:  2     B  2
15:  2     B  3
16:  2     B  4
17:  2     B  5
18:  2     B  6
19:  2     B  7
20:  2     B  8
21:  2     B  9
22:  2     B 10
23:  2     A 11
24:  2     A 12

(where the [order(id, V1)]-part isn't necessary)


Used data:

dt <- data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)

Upvotes: 1

akrun
akrun

Reputation: 887118

One option with data.table would be to melt the dataset after creating a sequence column, then grouped by 'i1', 'id', 'state', get the sequence of first and last 'value', dcast it from 'long' to 'wide'

dt1 <- melt(dt[, i1 := seq_len(.N)], id.vars = c("i1", "id", "state"))[,
      paste0("m", seq(first(value), last(value))), .(i1, id, state)]
dcast(dt1, id ~ V1, value.var = "state")[]
#    id m1 m10 m11 m12 m2 m3 m4 m5 m6 m7 m8 m9
#1:  1  A   A   A   A  A  A  B  B  B  B  A  A
#2:  2  B   B   A   A  B  B  B  B  B  B  B  B

Upvotes: 5

www
www

Reputation: 39154

A solution using the tidyverse.

library(tidyverse)
library(data.table)

dat <- data.table(
  id = c(rep(1, 3), rep(2, 2)),
  begin = c(1, 4, 8, 1, 11),
  end = c(3, 7, 12, 10, 12),
  state = c("A", "B", "A", "B", "A")
)

dat2 <- dat %>%
  mutate(Index = map2(begin, end, `:`)) %>%
  unnest() %>%
  mutate(Index = str_c("m", Index)) %>%
  select(id, state, Index) %>%
  spread(Index, state) %>%
  select(id, str_c("m", 1:(ncol(.) - 1)))
dat2
#   id m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12
# 1  1  A  A  A  B  B  B  B  A  A   A   A   A
# 2  2  B  B  B  B  B  B  B  B  B   B   A   A  

Upvotes: 2

Related Questions