Reputation: 11
I have this data frame:
ID <- c(1,1,2,3,3,3,4,5,6,6)
linguistic_fluency <- c("good", "very good", "bad", "bad", "very bad", "very good", "good", "very good", "normal", "very bad")
survey_year <- c(2007, 2008, 2009, 2009, 2008, 2007, 2007, 2008, 2007, 2008)
data <- data.frame(ID, linguistic_fluency, survey_year)
I would like to check whether the participants of the survey report their linguistic fluency consistently over years. Therefore, I would like to have the following table where the column is in t-1, and the row is in t.
I really appreciate your help. Thank you.
Upvotes: 1
Views: 69
Reputation: 1624
You could lag the variable and then make a frequency table. For example:
# Re-order the factor levels first
data$linguistic_fluency <- factor(data$linguistic_fluency,
levels = c("very bad","bad","normal","good","very good"))
library(Hmisc) # load library containing Lag() function
# apply function to each student
data$Lag_fluency <- unlist(tapply(data$linguistic_fluency, data$ID,function(x) Lag(x,1)))
# resulting in the following data frame. Some respondents only have one observation,
# the Lag() function returns NA for these respondents
> data
ID linguistic_fluency survey_year Lag_fluency
1 1 good 2007 <NA>
2 1 very good 2008 good
3 2 bad 2009 <NA>
4 3 bad 2009 <NA>
5 3 very bad 2008 bad
6 3 very good 2007 very bad
7 4 good 2007 <NA>
8 5 very good 2008 <NA>
9 6 normal 2007 <NA>
10 6 very bad 2008 normal
Then all you need is a frequency table between the original and lagged variable:
> table(data$Lag_fluency, data$linguistic_fluency)
very bad bad normal good very good
very bad 0 0 0 0 1
bad 1 0 0 0 0
normal 1 0 0 0 0
good 0 0 0 0 1
very good 0 0 0 0 0
Upvotes: 1