Reputation: 9793
> df = data.frame(A = c(1, 2, 3), B = c(3, 2, 2), C = c(3, 2, 1)); df
A B C
1 1 3 3
2 2 2 2
3 3 2 1
> df2 = data.frame(A = c(1, 2, 3), B = c(1, 2, 3), C = c(1, 2, 3)); df2
A B C
1 1 1 1
2 2 2 2
3 3 3 3
I want to know if all the columns in my data.frame are the same. For df
, it should be FALSE, whereas for df2
it should be TRUE.
Upvotes: 7
Views: 9877
Reputation: 1068
Perhaps worth mentioning the speed difference between the two solutions by josliber. The length(unique(..))
solution is the winner with small data, while all(sapply(...))
wins with large data.
df = data.frame(A = c(1, 2, 3), B = c(3, 2, 2), C = c(3, 2, 1))
df2 = data.frame(A = c(1, 2, 3), B = c(1, 2, 3), C = c(1, 2, 3))
# enlarge:
# df = do.call("rbind", replicate(10000, df, simplify = FALSE))
# df2 = do.call("rbind", replicate(10000, df2, simplify = FALSE))
microbenchmark::microbenchmark(
uniq1 =
{
length(unique(as.list(df))) == 1
},
uniq2 =
{
length(unique(as.list(df2))) == 1
},
ident1 =
{
all(sapply(df, identical, df[,1]))
},
ident2 =
{
all(sapply(df2, identical, df2[,1]))
}
)
# small:
Unit: microseconds
expr min lq mean median uq max neval cld
uniq1 4.243 4.5975 5.41435 5.0620 5.3685 19.852 100 a
uniq2 4.337 4.6425 5.80585 5.1340 5.3920 31.652 100 a
ident1 24.476 25.0100 28.22507 25.4255 26.4865 157.661 100 b
ident2 24.558 25.0380 28.08906 25.5215 26.6605 76.284 100 b
# large:
Unit: microseconds
expr min lq mean median uq max neval cld
uniq1 529.882 531.1020 537.98098 532.9360 538.0695 628.057 100 c
uniq2 872.855 874.7085 893.56305 884.1715 903.2400 987.257 100 d
ident1 25.004 26.2735 29.68082 27.7770 29.1075 55.286 100 a
ident2 369.629 371.1610 379.34730 372.6670 379.2495 455.276 100 b
Upvotes: 0
Reputation: 431
Here is a new handy update to this relatively old question:
You can use the function all_equal
from the package dplyr
. The function returns TRUE
if the two data frames are identical, otherwise a character vector describing the reasons why they are not equal.
Here are some more information: https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/all_equal
Upvotes: -1
Reputation: 2986
You can also check it using ‘all.equal’.
sapply(2:ncol(df),function(x) isTRUE(all.equal(df[,x-1],df[,x])))
[1] FALSE FALSE
sapply(2:ncol(df2),function(x) isTRUE(all.equal(df2[,x-1],df2[,x])))
[1] TRUE TRUE
Upvotes: 0
Reputation: 44299
You could check if the number of unique variable vectors is equal to one:
length(unique(as.list(df))) == 1
# [1] FALSE
length(unique(as.list(df2))) == 1
# [1] TRUE
Another way could be to check if each variable is identical to the first variable:
all(sapply(df, identical, df[,1]))
# [1] FALSE
all(sapply(df2, identical, df2[,1]))
# [1] TRUE
Upvotes: 9