Jovanny
Jovanny

Reputation: 33

SparkR: compare schemas of sparkdataframes

For example, we have 2 two identical sparkdataframes

library(SparkR)
df1 <- createDataFrame(iris)
df2 <- createDataFrame(iris)

How can we check that they have identical schemas?

sdf1 <- schema(df1)
sdf2 <- schema(df2)
print(sdf1)
print(sdf2)

We can see that the schemas are the same.

StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE

But

identical(sdf1, sdf2)
all.equal(sdf1, sdf2)

show that they are not identical. How can we compare schemas of sparkdataframes?

Upvotes: 0

Views: 54

Answers (1)

Vivek Atal
Vivek Atal

Reputation: 533

I would suggest using SparkR::dtypes to compare the schema.

identical(SparkR::dtypes(df1), SparkR::dtypes(df2))
# TRUE

Upvotes: 1

Related Questions