DV Hughes
DV Hughes

Reputation: 305

How can I use R to loop over levels of two factors

I am trying to code a factor variable to track between-year changes in another factor variable in R.

Suppose I have to following data strucutre:

df<-data.frame(id<-rep(seq(from=1,to=5,by=1),5), 
          year<-c(rep(2002,5), rep(2004,5), rep(2006,5), rep(2008, 5), 
                  rep(2010, 5)), factor<-rbinom(n=25,size=1,prob=0.5))

colnames(df)<-c("id", "year", "factor1")

df[,1]<-as.factor(as.character(df[,1]))

df[,2]<-as.factor(as.character(df[,2]))

df[,3]<-as.factor(as.character(df[,3]))

factor2<-c()

The Loop structure is as follows:

for(i in levels(df[,1])){
  for(j in levels(df[,2])){
    if(df[,3]>0){factor2<-1}
    else(factor2<-0)
  }
}

Which returns:

factor2 as a numeric vector with 0 elements

My question is, how can I get this loop structure to work?

Upvotes: 1

Views: 10974

Answers (1)

Ajar
Ajar

Reputation: 1826

First, your existing code replaces the contents of factor2 in each iteration of the loop. To add a new value without specifying an index, you can use the append() function. However, even with append(), your code as written will simply make factor2 a duplicate of df[, 3].

What I believe you want to do is create a new factor that is set to 1 if ID in year X is different from ID in the previous year. Try the following code, replacing your factor2<-c() line and continuing from there:

factor2 <- vector()

for ( i in levels(df[, 1]) ) {

  dummy <- df[df$id==i, ]
  factor2 <- append(factor2, 0) 

  for ( j in 2:length(dummy[, 2]) ) {   

    if ( dummy[j, 3] != dummy[j-1, 3] ) {
      factor2 <- append(factor2, 1) 
    }

    else { 
      factor2 <- append(factor2, 0) 
    }

  }

}

This code appends a 0 to factor2, since ID will never be different in the first year as there is no previous year for comparison. Then for each subsequent year, it checks to see if the new value of factor1 differs from the previous year's value. If so, it appends a 1 to factor2, otherwise it appends a 0.

At the end, for this example, factor2 will be a length 25 vector. However, since it was populated in ID order, you can't just add it to df, you instead need to sort df on ID first, then add factor2 to the results as a new column. Hope this helps!

Upvotes: 3

Related Questions