Reputation: 73
I have a very large data set including 250 string and numeric variables. I want to compare one after another columns together. For example, I am going to compare (difference) the first variable with second one, third one with fourth one, fifth one with sixth one and so on.
For example (The structure of the data set is something like this example), I want to compare number.x with number.y, day.x with day.y, school.x with school.y and etc.
number.x<-c(1,2,3,4,5,6,7)
number.y<-c(3,4,5,6,1,2,7)
day.x<-c(1,3,4,5,6,7,8)
day.y<-c(4,5,6,7,8,7,8)
school.x<-c("a","b","b","c","n","f","h")
school.y<-c("a","b","b","c","m","g","h")
city.x<- c(1,2,3,7,5,8,7)
city.y<- c(1,2,3,5,5,7,7)
Upvotes: 0
Views: 3190
Reputation: 60319
You mean, something like this?
> number.x == number.y
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
> length(which(number.x==number.y))
[1] 1
> school.x == school.y
[1] TRUE TRUE TRUE TRUE FALSE FALSE TRUE
> test.day <- day.x == day.y
> test.day
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
EDIT: Given your example variables above, we have:
df <- data.frame(number.x,
number.y,
day.x,
day.y,
school.x,
school.y,
city.x,
city.y,
stringsAsFactors=FALSE)
n <- ncol(df) # no of columns (assumed EVEN number)
k <- 1
comp <- list() # comparisons will be stored here
while (k <= n-1) {
l <- (k+1)/2
comp[[l]] <- df[,k] == df[,k+1]
k <- k+2
}
After which, you'll have:
> comp
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[[2]]
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
[[3]]
[1] TRUE TRUE TRUE TRUE FALSE FALSE TRUE
[[4]]
[1] TRUE TRUE TRUE FALSE TRUE FALSE TRUE
To get the comparison result between columns k
and k+1
, you look at the (k+1)/2
element of comp
- i.e to get the comparison results between columns 7 & 8, you look at the comp
element 8/2=4
:
> comp[[4]]
[1] TRUE TRUE TRUE FALSE TRUE FALSE TRUE
EDIT 2: To have the comparisons as new columns in the dataframe:
new.names <- rep('', n/2)
for (i in 1:(n/2)) {
new.names[i] <- paste0('V', i)
}
cc <- as.data.frame(comp, optional=TRUE)
names(cc) <- new.names
df.new <- cbind(df, cc)
After which, you have:
> df.new
number.x number.y day.x day.y school.x school.y city.x city.y V1 V2 V3 V4
1 1 3 1 4 a a 1 1 FALSE FALSE TRUE TRUE
2 2 4 3 5 b b 2 2 FALSE FALSE TRUE TRUE
3 3 5 4 6 b b 3 3 FALSE FALSE TRUE TRUE
4 4 6 5 7 c c 7 5 FALSE FALSE TRUE FALSE
5 5 1 6 8 n m 5 5 FALSE FALSE FALSE TRUE
6 6 2 7 7 f g 8 7 FALSE TRUE FALSE FALSE
7 7 7 8 8 h h 7 7 TRUE TRUE TRUE TRUE
Upvotes: 1