shadi
shadi

Reputation: 73

how to loop through columns in R

I have a very large data set including 250 string and numeric variables. I want to compare one after another columns together. For example, I am going to compare (difference) the first variable with second one, third one with fourth one, fifth one with sixth one and so on.
For example (The structure of the data set is something like this example), I want to compare number.x with number.y, day.x with day.y, school.x with school.y and etc.

number.x<-c(1,2,3,4,5,6,7)
number.y<-c(3,4,5,6,1,2,7)
day.x<-c(1,3,4,5,6,7,8)
day.y<-c(4,5,6,7,8,7,8)
school.x<-c("a","b","b","c","n","f","h")
school.y<-c("a","b","b","c","m","g","h")
city.x<- c(1,2,3,7,5,8,7)
city.y<- c(1,2,3,5,5,7,7) 

Upvotes: 0

Views: 3190

Answers (1)

desertnaut
desertnaut

Reputation: 60319

You mean, something like this?

> number.x == number.y
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
> length(which(number.x==number.y))
[1] 1
> school.x == school.y
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
> test.day <- day.x == day.y
> test.day
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

EDIT: Given your example variables above, we have:

df <- data.frame(number.x,
             number.y,
             day.x,
             day.y,
             school.x,
             school.y,
             city.x,
             city.y,
             stringsAsFactors=FALSE)

n <- ncol(df)  # no of columns (assumed EVEN number)

k <- 1
comp <- list()  # comparisons will be stored here

while (k <= n-1) {
      l <- (k+1)/2
      comp[[l]] <- df[,k] == df[,k+1]
      k <- k+2
}

After which, you'll have:

> comp
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

[[2]]
[1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

[[3]]
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE

[[4]]
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

To get the comparison result between columns k and k+1, you look at the (k+1)/2 element of comp - i.e to get the comparison results between columns 7 & 8, you look at the comp element 8/2=4:

> comp[[4]]
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

EDIT 2: To have the comparisons as new columns in the dataframe:

new.names <- rep('', n/2)
for (i in 1:(n/2)) {
     new.names[i] <- paste0('V', i)
}

cc <- as.data.frame(comp, optional=TRUE)
names(cc) <- new.names

df.new <- cbind(df, cc)

After which, you have:

> df.new
  number.x number.y day.x day.y school.x school.y city.x city.y    V1    V2    V3    V4
1        1        3     1     4        a        a      1      1 FALSE FALSE  TRUE  TRUE
2        2        4     3     5        b        b      2      2 FALSE FALSE  TRUE  TRUE
3        3        5     4     6        b        b      3      3 FALSE FALSE  TRUE  TRUE
4        4        6     5     7        c        c      7      5 FALSE FALSE  TRUE FALSE
5        5        1     6     8        n        m      5      5 FALSE FALSE FALSE  TRUE
6        6        2     7     7        f        g      8      7 FALSE  TRUE FALSE FALSE
7        7        7     8     8        h        h      7      7  TRUE  TRUE  TRUE  TRUE

Upvotes: 1

Related Questions