Reputation: 2666
I am iteratively fitting models to many different variables, and in a few rare cases two columns I am using as independent variables contain an identical set of values. This makes the model unidentifiable and throws an error. I would like a way to check if any columns are identical to any other columns within a dataframe, and then return the names of the columns that have a problem. Here is an example dataframe.
a <- rnorm(10)
b <- rnorm(10)
c <- a
d <- rnorm(10)
dat <- data.frame(a,b,c,d)
Folks have answered how to test if two individual columns in a dataframe are identical here. However, I would like a way to check every column against every other column.
Upvotes: 2
Views: 1166
Reputation: 28675
You can use combn
to get all pairs of column numbers, then apply over the resulting matrix to check if all elements are equal.
pairs <- t(combn(seq_len(ncol(dat)), 2))
same <- apply(pairs, 1, function(x) all(Reduce(`==`, dat[,x])))
pairs[same,]
# [1] 1 3
Or check the correlations (will also include linear combinations)
cor1 <- data.frame(which(cor(dat) == 1, arr.ind = T))
cor1[cor1$row > cor1$col,]
# row col
# c 3 1
Upvotes: 6
Reputation: 3875
You could use the dist
function to compute the matrix of distances between your columns, and find the combinations of columns for which the distance is 0.
m = as.matrix(dist(t(dat)))
m[upper.tri(m,diag=T)] = NA
which(m<1.5e-8,arr.ind=T)
row col
c 3 1
Note that this solution will only work for numerical columns. If you have qualitative variables in your dataframe, you won't be able to compare them.
Upvotes: 1
Reputation: 26343
The caret
package contains the function findLinearCombos
that you might wanna try
caret::findLinearCombos(dat)
#$linearCombos
#$linearCombos[[1]]
#[1] 3 1
#$remove
#[1] 3
But be aware that the function would also recommend the deletion of a column that is a
times minus 1
Second example
dat2 <- data.frame(a,b,c,d, e = -a)
caret::findLinearCombos(dat2)
#$linearCombos
#$linearCombos[[1]]
#[1] 3 1
#$linearCombos[[2]]
#[1] 5 1
#$remove
#[1] 3 5
Upvotes: 6