Multiple Correspondence Analysis on longitudinal data

Question

I would like to explore the profile of two modalities of a categorical variable over time with respect to a given set of other categorical variables. I paste a reproducible example of such a dataset below.

set.seed(90114)
V1<-sample(rep(c("a", "A"), 100))
V2<-sample(rep(c("a", "A", "b", "B"), 50))
V3<-sample(rep(c("F", "M", "I"), 67), 200)
V4<-sample(rep(c("C", "R"), 100))
V5<-sample(rep(c(1970, 1980, 1990, 2000, 2010), 40))
data<-data.frame(V1, V2, V3, V4, V5)

To explore the behavior of such modalities, I decided to use Multiple Correspondence Analysis (package FactoMineR). To account for variation over time, one possibility is to split the dataset into 5 subsamples which represent the different levels of V5 and then run MCA on each subset. The rest of the analysis consists in comparing the position of the modalities across the different biplots. However, such practice is not without problems if the original dataset is too small. In such a case, the dimensions could be flipped or worse, the location of the active variables are likely to change from one plot to the other.

To avoid the problem, one solution could be to stabilize the position of the active variables across all the subsets and predict the coordinates of the supplementary variable afterwards, allowing the latter to move over time. I read somewhere that the coordinates of a modality can be obtained by computing the weighted mean of the coordinates of individuals in which this modality is found. So finding the coordinates of a modality for the year 1970 would boil down to computing the weighted mean of the coordinates of the individuals in the 1970 subset for that modality. However, I don't know whether it's common practice and if yes, I just don't know how to implement such calculations. I paste the rest of the code in order for you to visualize the problem.

data.mca<-MCA(data[, -5], quali.sup=1, graph=F)

# Retrieve the coordinates of the first and second dimension

DIM1<-data.mca$ind$coord[, 1]
DIM2<-data.mca$ind$coord[, 2]

# Append the coordinates to the original dataframe

data1<-data.frame(data, DIM1, DIM2)

# Split the data into 5 clusters according to V5 ("year")

data1.split<-split(data1, data1$V5)
data1.split<-lapply(data1.split, function(x) x=x[, -5]) # to remove the fifth column with the years, no longer needed
seventies<-as.data.frame(data1.split[1])
eightties<-as.data.frame(data1.split[2])
# ...

a.1970<-seventies[seventies$X1970.V1=="a",]
A.1970<-seventies[seventies$X1970.V1=="A",]

# The idea, then, is to find the coordinates of the modalities "a" and "A" by computing the weighted mean of their respective indivuduals for each subset. The arithmetic mean would yield

# a.1970.DIM1<-mean(a.1970$X1970.DIM1) # 0.0818
# a.1970.DIM2<-mean(a.1970$X1970.DIM2) # 0.1104

# and so on for the other levels of V5.

I thank you in advance for your help!

Multiple Correspondence Analysis on longitudinal data

Answers (1)

Related Questions