Reputation: 33
For example, we have 2 two identical sparkdataframes
library(SparkR)
df1 <- createDataFrame(iris)
df2 <- createDataFrame(iris)
How can we check that they have identical schemas?
sdf1 <- schema(df1)
sdf2 <- schema(df2)
print(sdf1)
print(sdf2)
We can see that the schemas are the same.
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
But
identical(sdf1, sdf2)
all.equal(sdf1, sdf2)
show that they are not identical. How can we compare schemas of sparkdataframes?
Upvotes: 0
Views: 54
Reputation: 533
I would suggest using SparkR::dtypes
to compare the schema.
identical(SparkR::dtypes(df1), SparkR::dtypes(df2))
# TRUE
Upvotes: 1