Fanny0000
Fanny0000

Reputation: 11

How to use WeightedCluster to aggregate sequences and apply on Multichannel sequence analysis

I have 54399 cases, and 2 channels (HOM and HOS), and I want to use multichannel sequence analysis, the data example is as follows:

ID HOM1 HOM2 HOM3 HOM4 HOS1 HOS2 HOS3 HOS4
1 A A B C NO YES NO NO
2 A B A A YES UNCERTAIN YES YES

I used code:

HOM.seq<-seqdef(df[, 2:5])
HOS.seq<-seqdef(df[, 6:9])
channels<-list(HOM.seq, HOS.seq)
MDdist<-seqMD(channels, method="OM", sm=list("TRATE", "TRATE"), what="diss")

However, it gets warning that the "52322 unique sequences exceeds max allowed of 46340

My question is how to use wcAggregateCaese to reduce the number of unique sequences? even though this 52322 seems it has already been aggregated from 54399 sequences. Or can I use wcaggregatecase for HOM and HOS before put them in the channel list? Thanks

I have used wcAggregateCases separately for HOM and HOS and the aggregate cases are around 10000 for HOM and 7000 for HOS

Upvotes: 1

Views: 31

Answers (1)

Matthias Studer
Matthias Studer

Reputation: 1732

You can compute the weights and unique sequences using the combined sequence object. This combined sequence combines at each time position the states from the different channels. Here is an example on how to do so

library(TraMineR)
data(biofam)

## Building one channel per type of event left home, married, and child
bf <- as.matrix(biofam[, 10:25])
left <- bf==1 | bf==3 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
children <-  bf==4 | bf==5 | bf==6

## Building sequence objects
left.seq <- seqdef(left)
marr.seq <- seqdef(married)
child.seq <- seqdef(children)
channels <- list(LeftHome=left.seq, Marr=marr.seq, Child=child.seq)

## Retrieving the MD sequences or combined sequence
MDseq <- seqMD(channels)
## Now you have one sequence made by combining the different channels. 
alphabet(MDseq)

## Use wcAggregateCases() on the combined sequence
library(WeightedCluster)
ac <- wcAggregateCases(MDseq)
print(ac)
## Retrieving unique cases in the original data set
uniqueChannels <- list(LeftHome=left.seq[ac$aggIndex, ], Marr=marr.seq[ac$aggIndex, ], Child=child.seq[ac$aggIndex, ])
## Distance on unique data
MDdist <- seqMD(uniqueChannels, method="OM", sm=list("TRATE", "TRATE"), what="diss")

Upvotes: 0

Related Questions