Reputation: 305
I am trying to code a factor variable to track between-year changes in another factor variable in R.
Suppose I have to following data strucutre:
df<-data.frame(id<-rep(seq(from=1,to=5,by=1),5),
year<-c(rep(2002,5), rep(2004,5), rep(2006,5), rep(2008, 5),
rep(2010, 5)), factor<-rbinom(n=25,size=1,prob=0.5))
colnames(df)<-c("id", "year", "factor1")
df[,1]<-as.factor(as.character(df[,1]))
df[,2]<-as.factor(as.character(df[,2]))
df[,3]<-as.factor(as.character(df[,3]))
factor2<-c()
The Loop structure is as follows:
for(i in levels(df[,1])){
for(j in levels(df[,2])){
if(df[,3]>0){factor2<-1}
else(factor2<-0)
}
}
Which returns:
factor2 as a numeric vector with 0 elements
My question is, how can I get this loop structure to work?
Upvotes: 1
Views: 10974
Reputation: 1826
First, your existing code replaces the contents of factor2
in each iteration of the loop. To add a new value without specifying an index, you can use the append()
function. However, even with append()
, your code as written will simply make factor2
a duplicate of df[, 3]
.
What I believe you want to do is create a new factor that is set to 1 if ID in year X is different from ID in the previous year. Try the following code, replacing your factor2<-c()
line and continuing from there:
factor2 <- vector()
for ( i in levels(df[, 1]) ) {
dummy <- df[df$id==i, ]
factor2 <- append(factor2, 0)
for ( j in 2:length(dummy[, 2]) ) {
if ( dummy[j, 3] != dummy[j-1, 3] ) {
factor2 <- append(factor2, 1)
}
else {
factor2 <- append(factor2, 0)
}
}
}
This code appends a 0 to factor2
, since ID will never be different in the first year as there is no previous year for comparison. Then for each subsequent year, it checks to see if the new value of factor1
differs from the previous year's value. If so, it appends a 1 to factor2
, otherwise it appends a 0.
At the end, for this example, factor2
will be a length 25 vector. However, since it was populated in ID order, you can't just add it to df
, you instead need to sort df
on ID first, then add factor2
to the results as a new column. Hope this helps!
Upvotes: 3